How do i replace a specific combination of integer and string? - python

I'm trying to replace a python list with a deck of cards with symbols.
I have tried using Pythons replace function, but i assume the best solution for this is probably based on regular expression replacement.
The desired result would be this:
"Ah" => "A♥"
"5h" => "5♥"
etc.
Currently the list features items like this:
[Player name], [Player wallet], [1st player card], [2nd player card]
This could be i.e.:
["Don Johnson", 100, "Ks", "5d"]
["Davey Jones", 100, "4c", "3h"]
Any help for this would be greatly appreciated. Thanks.
(Edited for clarification on request - Thanks for all the input so far!)

Here, we can simply use four simple expressions and make that replacement that we wish:
([AKJQ0-9]{1,2})h
([AKJQ0-9]{1,2})d
and similarly the other two.
Demo
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"([AKJQ0-9]{1,2})h"
test_str = ("Ah\n"
"10h")
subst = "\\1♥"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

If you had just a list of cards then it would probably look something like this:
cards = ['2h', '2s', '2c', '2d', '3h', '3s', '3c', '3d', '4h', '4s', '4c', '4d', '5h', '5s', '5c', '5d', '6h', '6s', '6c', '6d', '7h', '7s', '7c', '7d', '8h', '8s', '8c', '8d', '9h', '9s', '9c', '9d', '10h', '10s', '10c', '10d', 'Ah', 'As', 'Ac', 'Ad', 'Kh', 'Ks', 'Kc', 'Kd', 'Jh', 'Js', 'Jc', 'Jd', 'Qh', 'Qs', 'Qc', 'Qd']
If so then just use a dict and a comprehension:
suits = {'h': '♥', 's': '♠', 'c': '♣', 'd': '♦'}
new_cards = [''.join(rank)+suits[suit] for *rank, suit in cards]
Output for this is:
['2♥', '2♠', '2♣', '2♦', '3♥', '3♠', '3♣', '3♦', '4♥', '4♠', '4♣', '4♦', '5♥', '5♠', '5♣', '5♦', '6♥', '6♠', '6♣', '6♦', '7♥', '7♠', '7♣', '7♦', '8♥', '8♠', '8♣', '8♦', '9♥', '9♠', '9♣', '9♦', '10♥', '10♠', '10♣', '10♦', 'A♥', 'A♠', 'A♣', 'A♦', 'K♥', 'K♠', 'K♣', 'K♦', 'J♥', 'J♠', 'J♣', 'J♦', 'Q♥', 'Q♠', 'Q♣', 'Q♦']
For your solution you could define a function that corrects the card:
def fix_card(card):
suits = {'h': '♥', 's': '♠', 'c': '♣', 'd': '♦'}
*rank, suit = card
return ''.join(rank)+suits[suit]
Then just use it like this:
player = ["Don Johnson", 100, "Ks", "5d"]
player[2] = fix_card(player[2])
player[3] = fix_card(player[3])
print(player)
#["Don Johnson", 100, "K♣", "5♦"]

No, regex is not needed for a simple replacement like this. Just use str.replace:
>>> cards = ['Ah', '5h']
>>> [s.replace('h', '♥') for s in cards]
['A♥', '5♥']

Related

Sorting a multidimensional array using merge sort?

I am trying to sort this multidimensional array after the number on the first index using the merge sort algorithm, but I am very unsure on how to do so.
This is the multidimensional array I am trying to sort:
Orders_db = [[1347517405, 54413, '78'], [1347517413, 54421, '86'], [1347517454, 54462, '127'], [1347517460, 54468, '133'], [1347517461, 54469, '134'], [1347517426, 54434, '99'], [1347517464, 54472, '137'], [1347517394, 54402, '67'], [1347517445, 54453, '118'], [1347517375, 54383, '48'], [1347517377, 54385, '50'], [1347517392, 54400, '65'], [1347517450, 54458, '123'], [1347517404, 54412, '77'], [1347517389, 54397, '62'], [1347517393, 54401, '66'], [1347517440, 54448, '113'], [1347517457, 54465, '130'], [1347517444, 54452, '117'], [1347517400, 54408, '73'], [1347517412, 54420, '85'], [1347517371, 54379, '44'], [1347517415, 54423, '88'], [1347517441, 54449, '114'], [1347517435, 54443, '108'], [1347517409, 54417, '82'], [1347517398, 54406, '71'], [1347517422, 54430, '95'], [1347517468, 54476, '141'], [1347517402, 54410, '75'], [1347517437, 54445, '110'], [1347517446, 54454, '119'], [1347517382, 54390, '55'], [1347517399, 54407, '72'], [1347517438, 54446, '111'], [1347517416, 54424, '89'], [1347517380, 54388, '53'], [1347517425, 54433, '98'], [1347517406, 54414, '79'], [1347517449, 54457, '122'], [1347517388, 54396, '61'], [1347517430, 54438, '103'], [1347517455, 54463, '128'], [1347517458, 54466, '131'], [1347517452, 54460, '125'], [1347517396, 54404, '69'], [1347517423, 54431, '96'], [1347517465, 54473, '138'], [1347517397, 54405, '70'], [1347517459, 54467, '132'], [1347517395, 54403, '68'], [1347517381, 54389, '54'], [1347517424, 54432, '97'], [1347517436, 54444, '109'], [1347517434, 54442, '107'], [1347517401, 54409, '74'], [1347517376, 54384, '49'], [1347517467, 54475, '140'], [1347517456, 54464, '129'], [1347517427, 54435, '100'], [1347517383, 54391, '56'], [1347517451, 54459, '124'], [1347517433, 54441, '106'], [1347517414, 54422, '87'], [1347517417, 54425, '90'], [1347517453, 54461, '126'], [1347517378, 54386, '51'], [1347517432, 54440, '105'], [1347517403, 54411, '76'], [1347517439, 54447, '112'], [1347517448, 54456, '121'], [1347517410, 54418, '83'], [1347517391, 54399, '64'], [1347517447, 54455, '120'], [1347517421, 54429, '94'], [1347517379, 54387, '52'], [1347517411, 54419, '84'], [1347517386, 54394, '59'], [1347517384, 54392, '57'], [1347517374, 54382, '47'], [1347517462, 54470, '135'], [1347517431, 54439, '104'], [1347517419, 54427, '92'], [1347517428, 54436, '101'], [1347517466, 54474, '139'], [1347517443, 54451, '116'], [1347517463, 54471, '136'], [1347517385, 54393, '58'], [1347517387, 54395, '60'], [1347517373, 54381, '46'], [1347517372, 54380, '45'], [1347517418, 54426, '91'], [1347517420, 54428, '93'], [1347517469, 54477, '142]'], [1347517442, 54450, '115'], [1347517408, 54416, '81'], [1347517390, 54398, '63'], [1347517407, 54415, '80'], [1347517429, 54437, '102']]
And I can implement a general merge sort algorithm, but i cannot do it in a way where I sort after the number in the array on the first index.
My implementation of merge sort is:
def merge_sort(arr):
if len(arr) > 1:
mid = len(arr) // 2
left = arr[:mid]
right = arr[mid:]
merge_sort(left)
merge_sort(right)
i = j = k = 0
while i < len(left) and j < len(right):
if left[i] < right[j]:
arr[k] = left[i]
i += 1
else:
arr[k] = right[j]
j += 1
k += 1
while i < len(left):
arr[k] = left[i]
i += 1
k += 1
while j < len(right):
arr[k] = right[j]
j += 1
k += 1
return arr
How can I fit the general merge sort algorithm to sort this array?
The preferred answer is that it returns the array with the highest number in the end.
Try below:
def merge(left, right):
if not len(left) or not len(right):
return left or right
result = []
i, j = 0, 0
while (len(result) < len(left) + len(right)):
if left[i][0] < right[j][0]:
result.append(left[i])
i+= 1
else:
result.append(right[j])
j+= 1
if i == len(left) or j == len(right):
result.extend(left[i:] or right[j:])
break
return result
def mergesort(list):
if len(list) < 2:
return list
middle = len(list)/2
left = mergesort(list[:middle])
right = mergesort(list[middle:])
return merge(left, right)
seq = [[1347517405, 54413, '78'], [1347517413, 54421, '86'], [1347517454, 54462, '127'], [1347517460, 54468, '133'], [1347517461, 54469, '134'], [1347517426, 54434, '99'], [1347517464, 54472, '137'], [1347517394, 54402, '67'], [1347517445, 54453, '118'], [1347517375, 54383, '48'], [1347517377, 54385, '50'], [1347517392, 54400, '65'], [1347517450, 54458, '123'], [1347517404, 54412, '77'], [1347517389, 54397, '62'], [1347517393, 54401, '66'], [1347517440, 54448, '113'], [1347517457, 54465, '130'], [1347517444, 54452, '117'], [1347517400, 54408, '73'], [1347517412, 54420, '85'], [1347517371, 54379, '44'], [1347517415, 54423, '88'], [1347517441, 54449, '114'], [1347517435, 54443, '108'], [1347517409, 54417, '82'], [1347517398, 54406, '71'], [1347517422, 54430, '95'], [1347517468, 54476, '141'], [1347517402, 54410, '75'], [1347517437, 54445, '110'], [1347517446, 54454, '119'], [1347517382, 54390, '55'], [1347517399, 54407, '72'], [1347517438, 54446, '111'], [1347517416, 54424, '89'], [1347517380, 54388, '53'], [1347517425, 54433, '98'], [1347517406, 54414, '79'], [1347517449, 54457, '122'], [1347517388, 54396, '61'], [1347517430, 54438, '103'], [1347517455, 54463, '128'], [1347517458, 54466, '131'], [1347517452, 54460, '125'], [1347517396, 54404, '69'], [1347517423, 54431, '96'], [1347517465, 54473, '138'], [1347517397, 54405, '70'], [1347517459, 54467, '132'], [1347517395, 54403, '68'], [1347517381, 54389, '54'], [1347517424, 54432, '97'], [1347517436, 54444, '109'], [1347517434, 54442, '107'], [1347517401, 54409, '74'], [1347517376, 54384, '49'], [1347517467, 54475, '140'], [1347517456, 54464, '129'], [1347517427, 54435, '100'], [1347517383, 54391, '56'], [1347517451, 54459, '124'], [1347517433, 54441, '106'], [1347517414, 54422, '87'], [1347517417, 54425, '90'], [1347517453, 54461, '126'], [1347517378, 54386, '51'], [1347517432, 54440, '105'], [1347517403, 54411, '76'], [1347517439, 54447, '112'], [1347517448, 54456, '121'], [1347517410, 54418, '83'], [1347517391, 54399, '64'], [1347517447, 54455, '120'], [1347517421, 54429, '94'], [1347517379, 54387, '52'], [1347517411, 54419, '84'], [1347517386, 54394, '59'], [1347517384, 54392, '57'], [1347517374, 54382, '47'], [1347517462, 54470, '135'], [1347517431, 54439, '104'], [1347517419, 54427, '92'], [1347517428, 54436, '101'], [1347517466, 54474, '139'], [1347517443, 54451, '116'], [1347517463, 54471, '136'], [1347517385, 54393, '58'], [1347517387, 54395, '60'], [1347517373, 54381, '46'], [1347517372, 54380, '45'], [1347517418, 54426, '91'], [1347517420, 54428, '93'], [1347517469, 54477, '142]'], [1347517442, 54450, '115'], [1347517408, 54416, '81'], [1347517390, 54398, '63'], [1347517407, 54415, '80'], [1347517429, 54437, '102']]
print("Given array is")
print(seq);
print("\n")
print("Sorted array is")
print(mergesort(seq))
If you chage left[i][0] < right[j][0] to left[i][1] < right[j][1] then it will sort accrding to the second element in the inner array.

How to get subsequent keys from a dictionary in python?

so I'm working on this referencing problem in which I need to get a key from a dictionary and its child keys(if any)
for example, I have a list of dictionary keys as such:
dict_keys(['1.', '1.1', '1.2', '1.2.1', '1.2.2', '1.2.2(a)', '1.2.2(b)', '1.2.2(c)', '1.2.2(d)', '1.2.3', '1.2.4', '1.2.5', '1.2.6', '1.2.7', '1.2.8', '1.2.9', '1.2.10', '1.2.11', '2.', '2.1', '3.', '3.1', '3.1.1', '3.1.2', '3.2', '3.3', '3.4', '3.5', '3.5.1', '3.5.2', '3.5.2(a)', '3.5.2(b)', '3.5.2(c)', '3.6', '3.7', '3.8', '3.9', '3.9.1', '3.9.2', '3.9.3', '3.10', '3.11', '3.11.1', '3.11.2', '4.', '4.1', '4.1.1', '4.1.2', '4.1.3', '4.1.4', '4.1.5', '4.1.6', '4.1.7', '4.1.8', '4.1.9', '4.1.10', '4.2', '5.', '5.1', '5.1.1', '5.1.2', '5.1.3', '5.1.4', '5.1.4(a)', '5.1.4(b)', '5.1.4(c)', '5.1.4(d)', '5.1.4(e)', '5.1.4(f)', '5.1.4(g)', '5.1.4(h)', '5.1.4(i)', '5.1.4(j)', '5.1.4(k)', '5.2', '5.3', '5.4', '6.', '6.1', '6.2', '7.', '7.1', '7.2', '7.3', '7.3.1', '7.3.2', '7.3.3', '7.4', '7.5', '7.6', '7.6.1', '7.6.2', '7.6.3', '7.6.4', '8.', '8.1', '8.2', '8.2.1', '8.2.2', '8.2.3', '8.2.4', '8.2.5', '8.2.6', '8.2.7', '8.3', '8.3.1', '8.3.2', '8.3.3', '8.3.4', '8.3.5', '8.3.6', '8.4', '9.', '9.1', '9.1.1', '9.1.2', '9.2', '9.3', '9.4', '9.5', '10.', '10.1', '10.2', '10.3', '10.4', '11.', '11.1', '11.2', '11.2.1', '11.2.2', '11.2.3', '11.3', '11.4', '11.5', '11.6', '11.7', '11.8', '11.9', '11.10', '12.', '12.1', '12.2', '12.2.1', '12.2.2', '12.3', '12.4', '12.4.1', '12.4.2', '12.4.3', '12.5', '12.6', '12.7', '12.8', '12.9', '12.10', '12.11', '13.', '13.1', '13.2', '13.3', '13.3.1', '13.3.2', '13.3.3', '13.3.4', '13.4', '13.4.1', '13.4.2', '13.4.3', '14.', '14.1', '14.2', '14.3', '15.', '15.1', '15.2', '15.2.1', '15.2.2', '15.2.3', '15.2.4', '16.', '16.1', '16.2', '16.3', '16.4', '16.5', '16.6', '16.7', '16.7.1', '16.7.2', '17.', '17.1', '17.1.1', '17.1.2', '17.1.3', '17.1.4', '17.2', '17.2.1', '17.2.2', '17.2.3', '17.2.4', '17.3', '17.4', '17.4.1', '17.4.2', '17.5', '17.5.1', '17.5.2', '17.5.3', '18.', '18.1', '18.2', '18.2.1', '18.2.1(a)', '18.2.1(b)', '18.2.1(c)', '18.2.2', '18.2.3', '18.2.4', '18.2.5', '18.2.5(a)', '18.2.5(b)', '18.2.6', '18.2.7', '19.', '19.1', '19.2', '19.3', '19.4', '20.', '21.', '21.1', '21.2', '21.3', '21.4', '22.', '22.1', '22.1.1', '22.1.2', '22.1.3', '22.1.4', '22.2', '22.2.1', '22.2.2', '22.3', '22.4', '22.5', '22.6', '22.6.1', '22.6.2', '22.7', '23.', '23.1', '23.1.1', '23.1.1(a)', '23.1.1(b)', '23.1.1(c)', '23.1.2', '23.1.3', '23.1.3(a)', '23.1.3(b)', '23.1.4', '23.2', '23.2.1', '23.2.2', '23.3', '23.4', '23.5', '24.', '24.1', '24.1.1', '24.1.2', '24.1.3', '24.1.4', '24.1.5', '24.1.6', '24.2', '24.3', '25.', '25.1', '25.1.1', '25.1.2', '25.1.3', '25.1.4', '25.1.5', '25.1.6', '25.2', '25.2.1', '25.2.2', '25.2.3', '25.2.4', '25.2.5', '25.3', '26.', '27.', '27.1', '27.2', '27.2.1', '27.2.2', '27.2.3', '27.3', '28.', '28.1', '28.2', '28.2.1', '28.2.2', '28.2.3', '28.2.4', '28.2.5', '28.3', '29.', '29.1', '29.1.1', '29.1.2', '29.1.3', '29.2', '30.', '30.1', '30.2', '30.2.1', '30.2.2', '30.2.3', '30.3', '30.4', '31.', '31.1', '31.2', '31.3', '31.4', '31.5', '31.5.1', '31.5.2', '31.6', '31.7', '31.8', '31.8.1', '31.8.2', '31.8.3', '31.9', '31.10', '31.11'])
in this, if I have something referred in key 3.1, I also want to extract the data for keys 3.1.1 and 3.1.2(child keys). so for this I used str.startwith() method which fails in cases as such.
It will not only return me the child clauses but also other clauses like 3.10, 3.11... which also start with 3.1, so this will result in some false positives and leakages.
The caveat however being that I don't know for sure that what numbering system will the user be using in the agreement.
for example:
it could be anything like:
['(1.)', '(1.)a.', '(2.)'] or ['1.', '1.1.', '1.1.1.', '1.1.1.a',] or ['1.', '1.1', '1.1.1', '1.1.1a']
so I'm trying to figure out a function using which I can cover all these ways.
How can I do that?
Thanks in advance! :)
You're much more familiar with your dataset and how robust you need to be, but I set that set of dict_keys to keys and used the following:
if key.count('.') < 2:
child_keys = [k for k in keys if k.startswith(key + '.')]
else:
child_keys = [k for k in keys if k.startswith(key + '(')]
If key = '1.2.2', then child_keys = ['1.2.2(a)', '1.2.2(b)', '1.2.2(c)', '1.2.2(d)']. If key = '3.1', then child_keys = ['3.1.1', '3.1.2'].
def get_subsequent_keys(key):
subsequent_keys = []
total_keys = structure.keys()
try:
if key[-1] == '.' or key[-1] == ')':
if key in total_keys:
pass
elif key[:-1] in total_keys:
subsequent_keys.append(key[:-1])
subsequent_keys += [x for x in total_keys if x.startswith(key)]
else:
key_pat = fr"({key}[a-z]+)"
subsequent_keys.append(key)
subsequent_keys += [x for x in total_keys if x.startswith(key + '.')]
subsequent_keys += [x for x in total_keys if x.startswith(key + '(')]
subsequent_keys += re.findall(key_pat, ' '.join(total_keys))
except Exception as exception:
logger.debug(exception)
finally:
return subsequent_keys
So I used this way to cover all the possible cases.
The problem with my use case is that I don't really have a definite way by which the user numbers the clauses in the agreement. Hence I needed a generic function to cover all ground. Also, thanks to Ava and Hayley for their help! :D

find words that can be made from a string in python

im fairly new to python and im not sure how to tackle my problem
im trying to make a program that can take a string of 15 characters from a .txt file and find words that you can make from those characters with a dictionary file, than output those words to another text file.
this is what i have tried:
attempting to find words that don't contain the characters and removing them from the list
various anagram solver type programs of git hub
i tried this sudo pip3 install anagram-solverbut it has been 3 hours on 15 characters and it is still running
im new so please tell me if im forgetting something
If you're looking for "perfect" anagrams, i.e. those that contain exactly the same number of characters, not a subset, it's pretty easy:
take your word-to-find, sort it by its letters
take your dictionary, sort each word by its letters
if the sorted versions match, they're anagrams
def find_anagrams(seek_word):
sorted_seek_word = sorted(seek_word.lower())
for word in open("/usr/share/dict/words"):
word = word.strip() # remove trailing newline
sorted_word = sorted(word.lower())
if sorted_word == sorted_seek_word and word != seek_word:
print(seek_word, word)
if __name__ == "__main__":
find_anagrams("begin")
find_anagrams("nicer")
find_anagrams("decor")
prints (on my macOS machine – Windows machines won't have /usr/share/dict/words by default, and some Linux distributions need it installed separately)
begin being
begin binge
nicer cerin
nicer crine
decor coder
decor cored
decor Credo
EDIT
A second variation that finds all words that are assemblable from the letters in the original word, using collections.Counter:
import collections
def find_all_anagrams(seek_word):
seek_word_counter = collections.Counter(seek_word.lower())
for word in open("/usr/share/dict/words"):
word = word.strip() # remove trailing newline
word_counter = collections.Counter(word.strip())
if word != seek_word and all(
n <= seek_word_counter[l] for l, n in word_counter.items()
):
yield word
if __name__ == "__main__":
print("decoration", set(find_all_anagrams("decoration")))
Outputs e.g.
decoration {'carte', 'drona', 'roit', 'oat', 'cantred', 'rond', 'rid', 'centroid', 'trine', 't', 'tenai', 'cond', 'toroid', 'recon', 'contra', 'dain', 'cootie', 'iao', 'arctoid', 'oner', 'indart', 'tine', 'nace', 'rident', 'cerotin', 'cran', 'eta', 'eoan', 'cardoon', 'tone', 'trend', 'trinode', 'coaid', 'ranid', 'rein', 'end', 'actine', 'ide', 'cero', 'iodate', 'corn', 'oer', 'retia', 'nidor', 'diter', 'drat', 'tec', 'tic', 'creat', 'arent', 'coon', 'doater', 'ornoite', 'terna', 'docent', 'tined', 'edit', 'octroi', 'eric', 'read', 'toned', 'c', 'tera', 'can', 'rocta', 'cortina', 'adonite', 'iced', 'no', 'natr', 'net', 'oe', 'rodeo', 'actor', 'otarine', 'on', 'cretin', 'ericad', 'dance', 'tornade', 'tinea', 'coontie', 'anerotic', 'acrite', 'ra', 'danio', 'inroad', 'inde', 'tied', 'tar', 'coronae', 'tid', 'rad', 'doc', 'derat', 'tea', 'acerin', 'ronde', 'recti', 'areito', 'drain', 'odontic', 'octoad', 'rio', 'actin', 'tread', 'rect', 'ariot', 'road', 'doctrine', 'enactor', 'indoor', 'toco', 'ton', 'trice', 'norite', 'nea', 'coda', 'noria', 'rot', 'trona', 'rice', 'arite', 'eria', 'orad', 'rate', 'toed', 'enact', 'crinet', 'cento', 'arid', 'coot', 'nat', 'nar', 'cain', 'at', 'antired', 'ear', 'triode', 'doter', 'cedarn', 'orna', 'rand', 'tari', 'crea', 'tiar', 'retan', 'tire', 'cora', 'aroid', 'iron', 'tenio', 'enroot', 'd', 'oaric', 'acetin', 'tain', 'neat', 'noter', 'tien', 'aortic', 'tode', 'dicer', 'irate', 'tie', 'canid', 'ado', 'noticer', 'arn', 'nacre', 'ceration', 'ratine', 'denaro', 'cotoin', 'aint', 'canto', 'cinter', 'decani', 'roon', 'donor', 'acnode', 'aide', 'doer', 'tacnode', 'oread', 'acetoin', 'rine', 'acton', 'conoid', 'a', 'otocrane', 'norate', 'care', 'ticer', 'io', 'detain', 'cedar', 'ta', 'toadier', 'atone', 'cornet', 'dacoit', 'toric', 'orate', 'arni', 'adroit', 'rend', 'tanier', 'rooted', 'doit', 'dier', 'odorate', 'trica', 'rated', 'cotonier', 'dine', 'roid', 'cairned', 'cat', 'i', 'coin', 'octine', 'trod', 'orc', 'cardo', 'eniac', 'arenoid', 'erd', 'creant', 'oda', 'ratio', 'ceria', 'ad', 'acorn', 'dorn', 'deric', 'credit', 'door', 'cinder', 'cantor', 'er', 'doon', 'coner', 'donate', 'roe', 'tora', 'antic', 'racoon', 'ooid', 'noa', 'tae', 'coroa', 'earn', 'retain', 'canted', 'norie', 'rota', 'tao', 'redan', 'rondo', 'entia', 'ctenoid', 'cent', 'daroo', 'inrooted', 'roed', 'adore', 'coat', 'e', 'rat', 'deair', 'arend', 'coir', 'acid', 'coronate', 'rodent', 'acider', 'iota', 'codo', 'redaction', 'cot', 'aeric', 'tonic', 'candier', 'decart', 'dicta', 'dot', 'recoat', 'caroon', 'rone', 'tarie', 'tarin', 'teca', 'oar', 'ocrea', 'ante', 'creation', 'tore', 'conto', 'tairn', 'roc', 'conter', 'coeditor', 'certain', 'roncet', 'decator', 'not', 'coatie', 'toran', 'caid', 'redia', 'root', 'cad', 'cartoon', 'n', 'coed', 'cand', 'neo', 'coronadite', 'dare', 'dartoic', 'acoin', 'detar', 'dite', 'trade', 'train', 'ordinate', 'racon', 'citron', 'dan', 'doat', 'nito', 'tercia', 'rote', 'cooer', 'acone', 'rita', 'caret', 'dern', 'enatic', 'too', 'cried', 'tade', 'dit', 'orient', 'ria', 'torn', 'coati', 'cnida', 'note', 'tried', 'acrid', 'nitro', 'acron', 'tern', 'one', 'it', 'naio', 'dor', 'ea', 'ca', 'ire', 'inert', 'orcanet', 'cine', 'coe', 'nardoo', 'deota', 'den', 'toi', 'adion', 'to', 'rite', 'nectar', 'rane', 'riant', 'cod', 'de', 'adit', 'airt', 'ie', 'retin', 'toon', 'cane', 'aeon', 'are', 'cointer', 'actioner', 'crin', 'detrain', 'art', 'cant', 'ort', 'tored', 'antoeci', 'tier', 'cite', 'onto', 'coater', 'tranced', 'atonic', 'roi', 'in', 'roan', 'decoat', 'rain', 'cronet', 'ronco', 'dont', 'citer', 'redact', 'cider', 'nor', 'octan', 'ration', 'doina', 'rie', 'aero', 'noted', 'crate', 'crain', 'cadet', 'condite', 'ran', 'odeon', 'date', 'eat', 'intoed', 'cation', 'carone', 'ratoon', 'retina', 'tiao', 'nice', 'nodi', 'codon', 'coo', 'torc', 'dent', 'entad', 'ne', 'toe', 'dae', 'decant', 'redcoat', 'coiner', 'irade', 'air', 'oint', 'coronet', 'radon', 'ce', 'octonare', 'oaten', 'citrean', 'dice', 'dancer', 'carotid', 'cretion', 'don', 'cion', 'nei', 'tead', 'nori', 'nacrite', 'ootid', 'rancid', 'dornic', 'orenda', 'cairn', 'aroon', 'coardent', 'aider', 'notice', 'cored', 'adorn', 'tad', 'carid', 'otic', 'dian', 'od', 'dint', 'tercio', 'die', 'conred', 'tice', 'rant', 'candor', 'anti', 'dar', 'antre', 'cornea', 'ordain', 'corona', 'recta', 'redo', 'tare', 'coranto', 'action', 'caird', 'creta', 'naid', 'tri', 'acre', 'crane', 'coated', 'citronade', 'anoetic', 'tenor', 'anode', 'triad', 'ceratoid', 'rod', 'idea', 'carton', 'cortin', 'endaortic', 'dicot', 'tend', 'da', 'tod', 'erotica', 'cord', 'coreid', 'toader', 'dace', 'tan', 'editor', 'rection', 'toner', 'cone', 'ni', 'tide', 'coder', 'din', 'ocote', 'ore', 'daer', 'octane', 'darn', 'do', 'reit', 'na', 'catenoid', 'tron', 'condor', 'crinated', 'cordon', 'crone', 'toad', 'noir', 'into', 'tirade', 'nadir', 'ant', 'ade', 'droit', 'icon', 'drone', 'ared', 'cardin', 'nid', 'dire', 'orcin', 'donator', 'rani', 'tane', 'ace', 'iodo', 'doria', 'ride', 'eon', 'ornate', 'cedrat', 'aire', 'carotin', 'dation', 'tear', 'onca', 'cote', 'taroc', 'con', 'nod', 'dinero', 'ecad', 'recant', 'ae', 'octad', 'cor', 'doctor', 'acridone', 'neti', 'cordite', 'crotin', 'aneroid', 'diota', 'coorie', 'dita', 'aconite', 'nard', 'cadent', 'ectad', 'rance', 'rea', 'tai', 'denat', 'rood', 'acne', 'decan', 'ani', 'rit', 'cit', 'cetin', 'odor', 'acorned', 'iceroot', 'inro', 'crood', 'daric', 'dacite', 'trone', 'acier', 'reina', 'oncia', 'drant', 'acrodont', 'nacred', 'cotrine', 'dinar', 'tean', 'atoner', 'toorie', 'nadorite', 'cardon', 'taen', 'tin', 'conte', 'acoine', 'dater', 'diact', 'aid', 'anodic', 'coronated', 'direct', 're', 'era', 'anticor', 'triace', 'octoid', 'dao', 'corta', 'edict', 'trode', 'ode', 'orant', 'niter', 'centrad', 'cater', 'tronc', 'coronad', 'r', 'toro', 'ar', 'once', 'ora', 'trace', 'creodont', 'erotic', 'ai', 'troca', 'ion', 'tecon', 'tra', 'acor', 'radio', 'acred', 'croon', 'tricae', 'recto', 'riden', 'andorite', 'taro', 'red', 'dear', 'ate', 'tinder', 'trin', 'deacon', 'ardent', 'aer', 'arc', 'crine', 'dart', 'diet', 'riot', 'tanrec', 'tor', 'noetic', 'ret', 'trance', 'ona', 'rind', 'coto', 'daoine', 'teind', 'toa', 'inter', 'code', 'cart', 'aion', 'detin', 'core', 'oont', 'rent', 'cedrin', 'card', 'trained', 'o', 'recoin', 'cro', 'and', 'diner', 'id', 'cordant', 'cedron', 'ditone', 'odic', 'cadi', 'cerin', 'nit', 'ecoid', 'nide', 'ean', 'andric', 'tind', 'raid', 'crena', 'oroide', 'roadite', 'canter', 'idant', 'cade', 'race', 'ten', 'caner', 'tarn', 'cooter', 'etna', 'tornadic', 'irone', 'ice', 'en', 'oord', 'oared', 'draine', 'cordate', 'react', 'reaction', 'tornado', 'troco', 'niota', 'carotenoid', 'an', 'cader', 'naric', 'car', 'centiar', 'ti', 'cearin', 'aroint', 'crined', 'iter', 'di', 'or', 'trio', 'dari', 'oration', 'orcein', 'coned', 'odorant', 'dean', 'coadore', 'cate', 'drate', 'dirten', 'ted', 'done', 'cadre', 'ocean', 'tired', 'adet', 'dirt', 'te', 'nae', 'ceti', 'cern', 'rotan', 'doe', 'roto', 'dote', 'node', 'ait', 'act', 'canoe', 'rode'}

Move item from list to another one for poker card game Python

I'm trying to make poker game in Python. In the while fuction I want to move the used cards in a separate(used cards) list. The problem is sometimes when I print the hand I can get duplicates. Something is wrong with my sorting strategy and I don't know what. Can you help me?
import random
deck = ['AS', 'KS', 'QS', 'JS', '10S', '9S', '8S', '7S', '6S', '5S', '4S', '3S', '2S',\
'AD', 'KD', 'QD', 'JD', '10D', '9D', '8D', '7D', '6D', '5D', '4D', '3D', '2D',\
'AC', 'KC', 'QC', 'JC', '10C', '9C', '8C', '7C', '6C', '5C', '4C', '3C', '2C',\
'AH', 'KH', 'QH', 'JH', '10H', '9H', '8H', '7H', '6H', '5H', '4H', '3H', '2H']
used = []
p1 = []
p2 = []
a = 0
while (a < 2):
drawn_card = random.choice(deck)
deck.append(drawn_card)
deck = [f for f in deck if f not in used]
p1.append(drawn_card)
a+=1
Well the random choice is not guaranteed to be unique, thus when you do:
drawn_card = random.choice(deck)
..
p1.append(drawn_card)
you may end up having duplicates (that explains that you some time see duplicates and some not).
Check if drawn_card is in the list first and if not, then append. That way you won't have duplicates. In code you could it like this:
if drawn_card not in p1:
p1.append(drawn_card)
Or, as Rory Daulton said:
If you are allowed, you could shuffle the entire deck, then remove consecutive items from that list.
you need to compare the random card with "p1" not with "deck":
import random
deck = ['AS', 'KS', 'QS', 'JS', '10S', '9S', '8S', '7S', '6S', '5S', '4S', '3S', '2S',\
'AD', 'KD', 'QD', 'JD', '10D', '9D', '8D', '7D', '6D', '5D', '4D', '3D', '2D',\
'AC', 'KC', 'QC', 'JC', '10C', '9C', '8C', '7C', '6C', '5C', '4C', '3C', '2C',\
'AH', 'KH', 'QH', 'JH', '10H', '9H', '8H', '7H', '6H', '5H', '4H', '3H', '2H']
used = []
p1 = []
a = 0
while (a < 2):
drawn_card = random.choice(deck)
print(drawn_card)
if drawn_card not in p1:
p1.append(drawn_card)
a += 1
continue
print (p1)

Find a list of things (e.g. a list of rivers) using NLTK

I would like to get lists of things, for example a list of names of rivers or a list of types of animals.
NLTK looks like it might be the thing for this, but I'm not sure how to do what I want. I'd like to have a function like:
get_list_of("river")
that would return something like
["amazon", "mississippi", "thames", ...]
I would suggest looking at NLTK wordnet API, see http://www.nltk.org/howto/wordnet.html.
But after doing some digging seems like Proper Nouns (i.e. names of river are not easy to track down in wordnet)
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('river')
[Synset('river.n.01')]
>>> wn.synset('river.n.01')
Synset('river.n.01')
>>> wn.synset('river.n.01').lemma_names
['river']
>>> wn.synsets('amazon')
[Synset('amazon.n.01'), Synset('amazon.n.02'), Synset('amazon.n.03'), Synset('amazon.n.04')]
>>> wn.synset('amazon.n.01').definition
'a large strong and aggressive woman'
>>> wn.synset('amazon.n.02').definition
'(Greek mythology) one of a nation of women warriors of Scythia (who burned off the right breast in order to use a bow and arrow more effectively)'
>>> wn.synset('amazon.n.03').definition
"a major South American river; arises in the Andes and flows eastward into the South Atlantic; the world's 2nd longest river (4000 miles)"
>>> wn.synset('amazon.n.04').definition
'mainly green tropical American parrots'
As a brute force way, look for "river" in a synsets' definitions, as such:
from itertools import chain
list(chain(*[i.lemma_names for i in wn.all_synsets() if "river" in i.definition]))
[out]:
['anaclinal', 'cataclinal', 'Acheronian', 'Acherontic', 'Stygian', 'hit-and-run', 'fluvial', 'riparian', 'Lao', 'debouch', 'rejuvenate', 'drive', 'ford', 'ascend', 'plant', 'drive', 'ford', 'fording', 'drive', 'driving', 'flood_control', 'conservancy', 'road_rage', 'Aegospotami', 'Aegospotamos', 'Yalu_River', 'three-spined_stickleback', 'Gasterosteus_aculeatus', 'ten-spined_stickleback', 'Gasterosteus_pungitius', 'placoderm', 'hellbender', 'mud_puppy', 'Cryptobranchus_alleganiensis', 'plains_spadefoot', 'Scaphiopus_bombifrons', 'mud_turtle', 'cooter', 'river_cooter', 'Pseudemys_concinna', 'spiny_softshell', 'Trionyx_spiniferus', 'smooth_softshell', 'Trionyx_muticus', 'teal', 'pintail', 'pin-tailed_duck', 'Anas_acuta', 'Ancylus', 'genus_Ancylus', 'freshwater_mussel', 'freshwater_clam', 'long-clawed_prawn', 'river_prawn', 'Palaemon_australis', 'Platanistidae', 'family_Platanistidae', 'hippopotamus', 'hippo', 'river_horse', 'Hippopotamus_amphibius', 'waterbuck', 'Australian_lungfish', 'Queensland_lungfish', 'Neoceratodus_forsteri', 'alewife', 'Alosa_pseudoharengus', 'Pomolobus_pseudoharengus', 'sockeye', 'sockeye_salmon', 'red_salmon', 'blueback_salmon', 'Oncorhynchus_nerka', 'brown_trout', 'salmon_trout', 'Salmo_trutta', 'Australian_arowana', 'Dawson_River_salmon', 'saratoga', 'spotted_barramundi', 'spotted_bonytongue', 'Scleropages_leichardti', 'Australian_bonytongue', 'northern_barramundi', 'Scleropages_jardinii', 'crappie', 'striped_bass', 'striper', 'Roccus_saxatilis', 'rockfish', 'bolti', 'Tilapia_nilotica', 'Chinese_paddlefish', 'Psephurus_gladis', 'air_bag', 'Augean_stables', 'barouche', 'bend', 'curve', 'boathouse', 'box', 'box_seat', 'brassie', 'bridge', 'span', 'bridle', 'brougham', 'buggy_whip', 'cab', 'car_mirror', 'coach', 'four-in-hand', 'coach-and-four', 'cockpit', 'death_seat', 'dredge', 'dredging_bucket', 'elbow', 'flat_tip_screwdriver', 'hansom', 'hansom_cab', 'keelboat', 'Lake_Volta', 'levee', 'levee', 'L-plate', 'machine_screw', 'outfall', 'Phillips_screwdriver', 'pull-in', 'pull-up', 'river_boat', 'showboat', 'skidpan', 'spiral_ratchet_screwdriver', 'ratchet_screwdriver', 'towpath', 'towing_path', 'truck_stop', 'willowware', 'willow-pattern', 'woodscrew', 'Copehan', 'Volgaic', 'horn', 'rip', 'riptide', 'tide_rip', 'crosscurrent', 'countercurrent', 'crappie', 'red_salmon', 'sockeye', 'sockeye_salmon', 'logjam', 'Teamsters_Union', 'car_pool', 'conservancy', 'headwater', 'river_basin', 'basin', 'watershed', 'drainage_basin', 'catchment_area', 'catchment_basin', 'drainage_area', 'confluence', 'meeting', 'Mammoth_Cave_National_Park', 'Zion_National_Park', 'watershed', 'water_parting', 'divide', 'Yangon', 'Rangoon', "N'Djamena", 'Ndjamena', 'Fort-Lamy', 'capital_of_Chad', 'Kinshasa', 'Leopoldville', 'Saxony', 'Sachsen', 'Saxe', 'Cologne', 'Koln', 'Mannheim', 'Rhineland', 'Rheinland', 'Ruhr', 'Ruhr_Valley', 'West_Bank', 'Pennines', 'Pennine_Chain', 'Ottawa', 'Canadian_capital', 'capital_of_Canada', 'Antwerpen', 'Antwerp', 'Anvers', 'Orleans', 'Rhone-Alpes', 'Friesland', 'Timbuktu', 'Bydgoszcz', 'Bromberg', 'Novosibirsk', 'Tbilisi', 'Tiflis', 'capital_of_Georgia', 'Toledo', 'Selma', 'Denver', 'Mile-High_City', 'capital_of_Colorado', 'Hartford', 'capital_of_Connecticut', 'Savannah', 'Topeka', 'capital_of_Kansas', 'Louisville', 'New_Orleans', 'Detroit', 'Motor_City', 'Motown', 'Minneapolis', 'Saint_Paul', 'St._Paul', 'capital_of_Minnesota', 'Jefferson_City', 'capital_of_Missouri', 'Saint_Louis', 'St._Louis', 'Gateway_to_the_West', 'Billings', 'Great_Falls', 'Omaha', 'Concord', 'capital_of_New_Hampshire', 'Manchester', 'Trenton', 'capital_of_New_Jersey', 'Albuquerque', 'New_Netherland', 'Albany', 'capital_of_New_York', 'Erie_Canal', 'New_York', 'New_York_City', 'Greater_New_York', 'West_Point', 'Niagara_Falls', 'Schenectady', 'Bismarck', 'capital_of_North_Dakota', 'Fargo', 'Cincinnati', 'Tulsa', 'Chester', 'Philadelphia', 'City_of_Brotherly_Love', 'Pierre', 'capital_of_South_Dakota', 'Mount_Vernon', 'Charleston', 'capital_of_West_Virginia', 'Huntington', 'Morgantown', 'Parkersburg', 'Wheeling', 'Casper', 'Ciudad_Bolivar', 'Aare', 'Aar', 'Aare_River', 'Acheron', 'River_Acheron', 'Adige', 'River_Adige', 'Aire', 'River_Aire', 'Aire_River', 'Alabama', 'Alabama_River', 'Allegheny', 'Allegheny_River', 'Amazon', 'Amazon_River', 'Amur', 'Amur_River', 'Heilong_Jiang', 'Heilong', 'Angara', 'Angara_River', 'Tunguska', 'Upper_Tunguska', 'Apalachicola', 'Apalachicola_River', 'Araguaia', 'Araguaia_River', 'Araguaya', 'Araguaya_River', 'Aras', 'Araxes', 'Arauca', 'Argun', 'Argun_River', 'Ergun_He', 'Arkansas', 'Arkansas_River', 'Arno', 'Arno_River', 'River_Arno', 'Avon', 'River_Avon', 'Upper_Avon', 'Upper_Avon_River', 'Avon', 'River_Avon', 'bar', 'Bighorn', 'Bighorn_River', 'Big_Sioux_River', 'billabong', 'bluff', 'body_of_water', 'water', 'bottomland', 'bottom', 'Brahmaputra', 'Brahmaputra_River', 'branch', 'Brazos', 'Brazos_River', 'brook', 'creek', 'Caloosahatchee', 'Caloosahatchee_River', 'Cam', 'River_Cam', 'Cam_River', 'Canadian', 'Canadian_River', 'canyon', 'canon', 'Cape_Fear_River', 'channel', 'Chao_Phraya', 'Charles', 'Charles_River', 'Chattahoochee', 'Chattahoochee_River', 'Cimarron', 'Cimarron_River', 'Clinch_River', 'Clyde', 'Cocytus', 'River_Cocytus', 'Colorado', 'Colorado_River', 'Colorado', 'Colorado_River', 'Columbia', 'Columbia_River', 'Congo', 'Congo_River', 'Zaire_River', 'Connecticut', 'Connecticut_River', 'Coosa', 'Coosa_River', 'Cumberland', 'Cumberland_River', 'dale', 'Danube', 'Danube_River', 'Danau', 'Darling', 'Darling_River', 'Delaware', 'Delaware_River', 'delta', 'Demerara', 'Detroit_River', 'distributary', 'Dnieper', 'Dnieper_River', 'Don', 'Don_River', 'Ebro', 'Ebro_River', 'Elbe', 'Elbe_River', 'Elizabeth_River', 'estuary', 'Euphrates', 'Euphrates_River', 'Flint', 'Flint_River', 'floodplain', 'flood_plain', 'Forth', 'Forth_River', 'Fox_River', 'Ganges', 'Ganges_River', 'Gan_Jiang', 'Kan_River', 'Garonne', 'Garonne_River', 'Gila', 'Gila_River', 'gorge', 'Grand_River', 'Green', 'Green_River', 'headstream', 'Housatonic', 'Housatonic_River', 'Huang_He', 'Hwang_Ho', 'Yellow_River', 'Hudson', 'Hudson_River', 'IJssel', 'IJssel_river', 'Illinois_River', 'Indigirka', 'Indigirka_River', 'Indus', 'Indus_River', 'Irrawaddy', 'Irrawaddy_River', 'Irtish', 'Irtish_River', 'Irtysh', 'Irtysh_River', 'Isere', 'Isere_River', 'James', 'James_River', 'James', 'James_River', 'Jordan', 'Jordan_River', 'Kansas', 'Kansas_River', 'Kaw_River', 'Kasai', 'Kasai_River', 'River_Kasai', 'Kissimmee', 'Kissimmee_River', 'Klamath', 'Klamath_River', 'Kura', 'Kura_River', 'Lake_Chad', 'Chad', 'Lehigh_River', 'Lena', 'Lena_River', 'Lethe', 'River_Lethe', 'liman', 'Limpopo', 'Crocodile_River', 'Little_Bighorn', 'Little_Bighorn_River', 'Little_Horn', 'Little_Missouri', 'Little_Missouri_River', 'Little_Sioux_River', 'Little_Wabash', 'Little_Wabash_River', 'Loire', 'Loire_River', 'Mackenzie', 'Mackenzie_River', 'Madeira', 'Madeira_River', 'Magdalena', 'Magdalena_River', 'meander', 'Mekong', 'Mekong_River', 'Merrimack', 'Merrimack_River', 'Meuse', 'Meuse_River', 'Milk', 'Milk_River', 'Mississippi', 'Mississippi_River', 'Missouri', 'Missouri_River', 'Mobile', 'Mobile_River', 'Mohawk_River', 'Monongahela', 'Monongahela_River', 'Moreau_River', 'Murray', 'Murray_River', 'Murrumbidgee', 'Murrumbidgee_River', 'Namoi', 'Namoi_River', 'Nan', 'Nan_River', 'Neckar', 'Neckar_River', 'Neosho', 'Neosho_River', 'Neva', 'Neva_River', 'New_River', 'Niagara', 'Niagara_River', 'Niger', 'Niger_River', 'Nile', 'Nile_River', 'North_Platte', 'North_Platte_River', 'Ob', 'Ob_River', 'Oder', 'Oder_River', 'Ohio', 'Ohio_River', 'Orange', 'Orange_River', 'Orinoco', 'Orinoco_River', 'Osage', 'Osage_River', 'Outaouais', 'Ottawa', 'Ottawa_river', 'Ouachita', 'Ouachita_River', 'Ouse', 'Ouse_River', 'oxbow', 'oxbow_lake', 'Parana', 'Parana_River', 'Parnaiba', 'Parnahiba', 'Pearl_River', 'Pee_Dee', 'Pee_Dee_River', 'Penobscot', 'Penobscot_River', 'Ping', 'Ping_River', 'Platte', 'Platte_River', 'Po', 'Po_River', 'Potomac', 'Potomac_River', 'Purus', 'Purus_River', 'rapid', 'Rappahannock', 'Rappahannock_River', 'Rhine', 'Rhine_River', 'Rhein', 'Rhone', 'Rhone_River', 'Rio_Grande', 'Rio_Bravo', 'riparian_forest', 'riverbank', 'riverside', 'riverbed', 'river_bottom', 'river_boulder', 'Russian_River', 'Saale', 'Saale_River', 'Sabine', 'Sabine_River', 'Sacramento_River', 'Saint_John', 'Saint_John_River', 'St._John', 'St._John_River', 'Saint_Johns', 'Saint_Johns_River', 'St._Johns', 'St._Johns_River', 'Saint_Lawrence', 'Saint_Lawrence_River', 'St._Lawrence', 'St._Lawrence_River', 'Sambre', 'Sambre_River', 'sandbank', 'San_Joaquin_River', 'Sao_Francisco', 'Saone', 'Saone_River', 'Savannah', 'Savannah_River', 'Scheldt', 'Scheldt_River', 'Seine', 'Seine_River', 'Severn', 'River_Severn', 'Severn_River', 'Severn', 'Severn_River', 'Seyhan', 'Seyhan_River', 'Shari', 'Shari_River', 'Chari', 'Chari_River', 'Shenandoah_River', 'Styx', 'River_Styx', 'Sun_River', 'Suriname_River', 'Surinam_River', 'Susquehanna', 'Susquehanna_River', 'Tagus', 'Tagus_River', 'Tallapoosa', 'Tallapoosa_River', 'Tennessee', 'Tennessee_River', 'Thames', 'River_Thames', 'Thames_River', 'Tiber', 'Tevere', 'Tigris', 'Tigris_River', 'Tocantins', 'Tocantins_River', 'Tombigbee', 'Tombigbee_River', 'Trent', 'River_Trent', 'Trent_River', 'Trinity_River', 'Tunguska', 'Lower_Tunguska', 'Tunguska', 'Stony_Tunguska', 'Tyne', 'River_Tyne', 'Tyne_River', 'Urubupunga', 'Urubupunga_Falls', 'Uruguay_River', 'valley', 'vale', 'Vetluga', 'Vetluga_River', 'Vistula', 'Vistula_River', 'Volga', 'Volga_River', 'Volkhov', 'Volkhov_River', 'Volta', 'waterfall', 'falls', 'water_system', 'Weser', 'Weser_River', 'Willamette', 'Willamette_River', 'Yalu', 'Yalu_River', 'Chang_Jiang', 'Changjiang', 'Chang', 'Yangtze', 'Yangtze_River', 'Yangtze_Kiang', 'Yazoo', 'Yazoo_River', 'Yenisei', 'Yenisei_River', 'Yenisey', 'Yenisey_River', 'Yukon', 'Yukon_River', 'Zambezi', 'Zambezi_River', 'Zhu_Jiang', 'Canton_River', 'Chu_Kiang', 'Pearl_River', 'Charon', 'naiad', 'Achilles', 'finisher', 'Algonkian', 'Algonkin', 'Arikara', 'Aricara', 'Chinook', 'Conoy', 'Halchidhoma', 'Hidatsa', 'Gros_Ventre', 'Kansa', 'Kansas', 'Karok', 'Maidu', 'Maricopa', 'Missouri', 'Mohave', 'Mojave', 'Ofo', 'Omaha', 'Maha', 'Osage', 'Oto', 'Otoe', 'Pamlico', 'Ponca', 'Ponka', 'Quapaw', 'Shahaptian', 'Sahaptin', 'Sahaptino', 'Shawnee', 'Tsimshian', 'Walapai', 'Hualapai', 'Hualpai', 'Yeniseian', 'Yakut', 'charioteer', 'driver', 'honker', 'lasher', 'mahout', 'nondriver', 'road_hog', 'roadhog', 'speeder', 'speed_demon', 'tailgater', 'teamster', 'test_driver', 'wagoner', 'waggoner', 'Cartier', 'Jacques_Cartier', 'Oldfield', 'Barney_Oldfield', 'Berna_Eli_Oldfield', 'debacle', 'bald_cypress', 'swamp_cypress', 'pond_bald_cypress', 'southern_cypress', 'Taxodium_distichum', 'Montezuma_cypress', 'Mexican_swamp_cypress', 'Taxodium_mucronatum', 'pistia', 'water_lettuce', 'water_cabbage', 'Pistia_stratiotes', 'Pistia_stratoites', 'great_yellowcress', 'Rorippa_amphibia', 'Nasturtium_amphibium', 'giant_reed', 'Arundo_donax', 'Phragmites', 'genus_Phragmites', 'black_birch', 'river_birch', 'red_birch', 'Betula_nigra', 'river_red_gum', 'river_gum', 'Eucalyptus_camaldulensis', 'Eucalyptus_rostrata', 'false_indigo', 'bastard_indigo', 'Amorpha_fruticosa', 'thermal_pollution', 'water_pollution', 'alluvial_soil', 'Senegal_gum', 'silt']
But it seems like there are still some noise from the "brute force" method. Let's try to assume that if it is the name of the river it should start with an uppercase, so let's try:
list(chain(*[ [j for j in i.lemma_names if j[0].isupper()] for i in wn.all_synsets() if "river" in i.definition]))
[out]:
['Acheronian', 'Acherontic', 'Stygian', 'Lao', 'Aegospotami', 'Aegospotamos', 'Yalu_River', 'Gasterosteus_aculeatus', 'Gasterosteus_pungitius', 'Cryptobranchus_alleganiensis', 'Scaphiopus_bombifrons', 'Pseudemys_concinna', 'Trionyx_spiniferus', 'Trionyx_muticus', 'Anas_acuta', 'Ancylus', 'Palaemon_australis', 'Platanistidae', 'Hippopotamus_amphibius', 'Australian_lungfish', 'Queensland_lungfish', 'Neoceratodus_forsteri', 'Alosa_pseudoharengus', 'Pomolobus_pseudoharengus', 'Oncorhynchus_nerka', 'Salmo_trutta', 'Australian_arowana', 'Dawson_River_salmon', 'Scleropages_leichardti', 'Australian_bonytongue', 'Scleropages_jardinii', 'Roccus_saxatilis', 'Tilapia_nilotica', 'Chinese_paddlefish', 'Psephurus_gladis', 'Augean_stables', 'Lake_Volta', 'L-plate', 'Phillips_screwdriver', 'Copehan', 'Volgaic', 'Teamsters_Union', 'Mammoth_Cave_National_Park', 'Zion_National_Park', 'Yangon', 'Rangoon', "N'Djamena", 'Ndjamena', 'Fort-Lamy', 'Kinshasa', 'Leopoldville', 'Saxony', 'Sachsen', 'Saxe', 'Cologne', 'Koln', 'Mannheim', 'Rhineland', 'Rheinland', 'Ruhr', 'Ruhr_Valley', 'West_Bank', 'Pennines', 'Pennine_Chain', 'Ottawa', 'Canadian_capital', 'Antwerpen', 'Antwerp', 'Anvers', 'Orleans', 'Rhone-Alpes', 'Friesland', 'Timbuktu', 'Bydgoszcz', 'Bromberg', 'Novosibirsk', 'Tbilisi', 'Tiflis', 'Toledo', 'Selma', 'Denver', 'Mile-High_City', 'Hartford', 'Savannah', 'Topeka', 'Louisville', 'New_Orleans', 'Detroit', 'Motor_City', 'Motown', 'Minneapolis', 'Saint_Paul', 'St._Paul', 'Jefferson_City', 'Saint_Louis', 'St._Louis', 'Gateway_to_the_West', 'Billings', 'Great_Falls', 'Omaha', 'Concord', 'Manchester', 'Trenton', 'Albuquerque', 'New_Netherland', 'Albany', 'Erie_Canal', 'New_York', 'New_York_City', 'Greater_New_York', 'West_Point', 'Niagara_Falls', 'Schenectady', 'Bismarck', 'Fargo', 'Cincinnati', 'Tulsa', 'Chester', 'Philadelphia', 'City_of_Brotherly_Love', 'Pierre', 'Mount_Vernon', 'Charleston', 'Huntington', 'Morgantown', 'Parkersburg', 'Wheeling', 'Casper', 'Ciudad_Bolivar', 'Aare', 'Aar', 'Aare_River', 'Acheron', 'River_Acheron', 'Adige', 'River_Adige', 'Aire', 'River_Aire', 'Aire_River', 'Alabama', 'Alabama_River', 'Allegheny', 'Allegheny_River', 'Amazon', 'Amazon_River', 'Amur', 'Amur_River', 'Heilong_Jiang', 'Heilong', 'Angara', 'Angara_River', 'Tunguska', 'Upper_Tunguska', 'Apalachicola', 'Apalachicola_River', 'Araguaia', 'Araguaia_River', 'Araguaya', 'Araguaya_River', 'Aras', 'Araxes', 'Arauca', 'Argun', 'Argun_River', 'Ergun_He', 'Arkansas', 'Arkansas_River', 'Arno', 'Arno_River', 'River_Arno', 'Avon', 'River_Avon', 'Upper_Avon', 'Upper_Avon_River', 'Avon', 'River_Avon', 'Bighorn', 'Bighorn_River', 'Big_Sioux_River', 'Brahmaputra', 'Brahmaputra_River', 'Brazos', 'Brazos_River', 'Caloosahatchee', 'Caloosahatchee_River', 'Cam', 'River_Cam', 'Cam_River', 'Canadian', 'Canadian_River', 'Cape_Fear_River', 'Chao_Phraya', 'Charles', 'Charles_River', 'Chattahoochee', 'Chattahoochee_River', 'Cimarron', 'Cimarron_River', 'Clinch_River', 'Clyde', 'Cocytus', 'River_Cocytus', 'Colorado', 'Colorado_River', 'Colorado', 'Colorado_River', 'Columbia', 'Columbia_River', 'Congo', 'Congo_River', 'Zaire_River', 'Connecticut', 'Connecticut_River', 'Coosa', 'Coosa_River', 'Cumberland', 'Cumberland_River', 'Danube', 'Danube_River', 'Danau', 'Darling', 'Darling_River', 'Delaware', 'Delaware_River', 'Demerara', 'Detroit_River', 'Dnieper', 'Dnieper_River', 'Don', 'Don_River', 'Ebro', 'Ebro_River', 'Elbe', 'Elbe_River', 'Elizabeth_River', 'Euphrates', 'Euphrates_River', 'Flint', 'Flint_River', 'Forth', 'Forth_River', 'Fox_River', 'Ganges', 'Ganges_River', 'Gan_Jiang', 'Kan_River', 'Garonne', 'Garonne_River', 'Gila', 'Gila_River', 'Grand_River', 'Green', 'Green_River', 'Housatonic', 'Housatonic_River', 'Huang_He', 'Hwang_Ho', 'Yellow_River', 'Hudson', 'Hudson_River', 'IJssel', 'IJssel_river', 'Illinois_River', 'Indigirka', 'Indigirka_River', 'Indus', 'Indus_River', 'Irrawaddy', 'Irrawaddy_River', 'Irtish', 'Irtish_River', 'Irtysh', 'Irtysh_River', 'Isere', 'Isere_River', 'James', 'James_River', 'James', 'James_River', 'Jordan', 'Jordan_River', 'Kansas', 'Kansas_River', 'Kaw_River', 'Kasai', 'Kasai_River', 'River_Kasai', 'Kissimmee', 'Kissimmee_River', 'Klamath', 'Klamath_River', 'Kura', 'Kura_River', 'Lake_Chad', 'Chad', 'Lehigh_River', 'Lena', 'Lena_River', 'Lethe', 'River_Lethe', 'Limpopo', 'Crocodile_River', 'Little_Bighorn', 'Little_Bighorn_River', 'Little_Horn', 'Little_Missouri', 'Little_Missouri_River', 'Little_Sioux_River', 'Little_Wabash', 'Little_Wabash_River', 'Loire', 'Loire_River', 'Mackenzie', 'Mackenzie_River', 'Madeira', 'Madeira_River', 'Magdalena', 'Magdalena_River', 'Mekong', 'Mekong_River', 'Merrimack', 'Merrimack_River', 'Meuse', 'Meuse_River', 'Milk', 'Milk_River', 'Mississippi', 'Mississippi_River', 'Missouri', 'Missouri_River', 'Mobile', 'Mobile_River', 'Mohawk_River', 'Monongahela', 'Monongahela_River', 'Moreau_River', 'Murray', 'Murray_River', 'Murrumbidgee', 'Murrumbidgee_River', 'Namoi', 'Namoi_River', 'Nan', 'Nan_River', 'Neckar', 'Neckar_River', 'Neosho', 'Neosho_River', 'Neva', 'Neva_River', 'New_River', 'Niagara', 'Niagara_River', 'Niger', 'Niger_River', 'Nile', 'Nile_River', 'North_Platte', 'North_Platte_River', 'Ob', 'Ob_River', 'Oder', 'Oder_River', 'Ohio', 'Ohio_River', 'Orange', 'Orange_River', 'Orinoco', 'Orinoco_River', 'Osage', 'Osage_River', 'Outaouais', 'Ottawa', 'Ottawa_river', 'Ouachita', 'Ouachita_River', 'Ouse', 'Ouse_River', 'Parana', 'Parana_River', 'Parnaiba', 'Parnahiba', 'Pearl_River', 'Pee_Dee', 'Pee_Dee_River', 'Penobscot', 'Penobscot_River', 'Ping', 'Ping_River', 'Platte', 'Platte_River', 'Po', 'Po_River', 'Potomac', 'Potomac_River', 'Purus', 'Purus_River', 'Rappahannock', 'Rappahannock_River', 'Rhine', 'Rhine_River', 'Rhein', 'Rhone', 'Rhone_River', 'Rio_Grande', 'Rio_Bravo', 'Russian_River', 'Saale', 'Saale_River', 'Sabine', 'Sabine_River', 'Sacramento_River', 'Saint_John', 'Saint_John_River', 'St._John', 'St._John_River', 'Saint_Johns', 'Saint_Johns_River', 'St._Johns', 'St._Johns_River', 'Saint_Lawrence', 'Saint_Lawrence_River', 'St._Lawrence', 'St._Lawrence_River', 'Sambre', 'Sambre_River', 'San_Joaquin_River', 'Sao_Francisco', 'Saone', 'Saone_River', 'Savannah', 'Savannah_River', 'Scheldt', 'Scheldt_River', 'Seine', 'Seine_River', 'Severn', 'River_Severn', 'Severn_River', 'Severn', 'Severn_River', 'Seyhan', 'Seyhan_River', 'Shari', 'Shari_River', 'Chari', 'Chari_River', 'Shenandoah_River', 'Styx', 'River_Styx', 'Sun_River', 'Suriname_River', 'Surinam_River', 'Susquehanna', 'Susquehanna_River', 'Tagus', 'Tagus_River', 'Tallapoosa', 'Tallapoosa_River', 'Tennessee', 'Tennessee_River', 'Thames', 'River_Thames', 'Thames_River', 'Tiber', 'Tevere', 'Tigris', 'Tigris_River', 'Tocantins', 'Tocantins_River', 'Tombigbee', 'Tombigbee_River', 'Trent', 'River_Trent', 'Trent_River', 'Trinity_River', 'Tunguska', 'Lower_Tunguska', 'Tunguska', 'Stony_Tunguska', 'Tyne', 'River_Tyne', 'Tyne_River', 'Urubupunga', 'Urubupunga_Falls', 'Uruguay_River', 'Vetluga', 'Vetluga_River', 'Vistula', 'Vistula_River', 'Volga', 'Volga_River', 'Volkhov', 'Volkhov_River', 'Volta', 'Weser', 'Weser_River', 'Willamette', 'Willamette_River', 'Yalu', 'Yalu_River', 'Chang_Jiang', 'Changjiang', 'Chang', 'Yangtze', 'Yangtze_River', 'Yangtze_Kiang', 'Yazoo', 'Yazoo_River', 'Yenisei', 'Yenisei_River', 'Yenisey', 'Yenisey_River', 'Yukon', 'Yukon_River', 'Zambezi', 'Zambezi_River', 'Zhu_Jiang', 'Canton_River', 'Chu_Kiang', 'Pearl_River', 'Charon', 'Achilles', 'Algonkian', 'Algonkin', 'Arikara', 'Aricara', 'Chinook', 'Conoy', 'Halchidhoma', 'Hidatsa', 'Gros_Ventre', 'Kansa', 'Kansas', 'Karok', 'Maidu', 'Maricopa', 'Missouri', 'Mohave', 'Mojave', 'Ofo', 'Omaha', 'Maha', 'Osage', 'Oto', 'Otoe', 'Pamlico', 'Ponca', 'Ponka', 'Quapaw', 'Shahaptian', 'Sahaptin', 'Sahaptino', 'Shawnee', 'Tsimshian', 'Walapai', 'Hualapai', 'Hualpai', 'Yeniseian', 'Yakut', 'Cartier', 'Jacques_Cartier', 'Oldfield', 'Barney_Oldfield', 'Berna_Eli_Oldfield', 'Taxodium_distichum', 'Montezuma_cypress', 'Mexican_swamp_cypress', 'Taxodium_mucronatum', 'Pistia_stratiotes', 'Pistia_stratoites', 'Rorippa_amphibia', 'Nasturtium_amphibium', 'Arundo_donax', 'Phragmites', 'Betula_nigra', 'Eucalyptus_camaldulensis', 'Eucalyptus_rostrata', 'Amorpha_fruticosa', 'Senegal_gum']
Let's go crazy and say that only if the word "River" appears in the lemma, it is a river:
>>> list(chain(*[ [j for j in i.lemma_names if j[0].isupper() and "River" in j] for i in wn.all_synsets() if "river" in i.definition]))
[out]:
['Yalu_River', 'Dawson_River_salmon', 'Aare_River', 'River_Acheron', 'River_Adige', 'River_Aire', 'Aire_River', 'Alabama_River', 'Allegheny_River', 'Amazon_River', 'Amur_River', 'Angara_River', 'Apalachicola_River', 'Araguaia_River', 'Araguaya_River', 'Argun_River', 'Arkansas_River', 'Arno_River', 'River_Arno', 'River_Avon', 'Upper_Avon_River', 'River_Avon', 'Bighorn_River', 'Big_Sioux_River', 'Brahmaputra_River', 'Brazos_River', 'Caloosahatchee_River', 'River_Cam', 'Cam_River', 'Canadian_River', 'Cape_Fear_River', 'Charles_River', 'Chattahoochee_River', 'Cimarron_River', 'Clinch_River', 'River_Cocytus', 'Colorado_River', 'Colorado_River', 'Columbia_River', 'Congo_River', 'Zaire_River', 'Connecticut_River', 'Coosa_River', 'Cumberland_River', 'Danube_River', 'Darling_River', 'Delaware_River', 'Detroit_River', 'Dnieper_River', 'Don_River', 'Ebro_River', 'Elbe_River', 'Elizabeth_River', 'Euphrates_River', 'Flint_River', 'Forth_River', 'Fox_River', 'Ganges_River', 'Kan_River', 'Garonne_River', 'Gila_River', 'Grand_River', 'Green_River', 'Housatonic_River', 'Yellow_River', 'Hudson_River', 'Illinois_River', 'Indigirka_River', 'Indus_River', 'Irrawaddy_River', 'Irtish_River', 'Irtysh_River', 'Isere_River', 'James_River', 'James_River', 'Jordan_River', 'Kansas_River', 'Kaw_River', 'Kasai_River', 'River_Kasai', 'Kissimmee_River', 'Klamath_River', 'Kura_River', 'Lehigh_River', 'Lena_River', 'River_Lethe', 'Crocodile_River', 'Little_Bighorn_River', 'Little_Missouri_River', 'Little_Sioux_River', 'Little_Wabash_River', 'Loire_River', 'Mackenzie_River', 'Madeira_River', 'Magdalena_River', 'Mekong_River', 'Merrimack_River', 'Meuse_River', 'Milk_River', 'Mississippi_River', 'Missouri_River', 'Mobile_River', 'Mohawk_River', 'Monongahela_River', 'Moreau_River', 'Murray_River', 'Murrumbidgee_River', 'Namoi_River', 'Nan_River', 'Neckar_River', 'Neosho_River', 'Neva_River', 'New_River', 'Niagara_River', 'Niger_River', 'Nile_River', 'North_Platte_River', 'Ob_River', 'Oder_River', 'Ohio_River', 'Orange_River', 'Orinoco_River', 'Osage_River', 'Ouachita_River', 'Ouse_River', 'Parana_River', 'Pearl_River', 'Pee_Dee_River', 'Penobscot_River', 'Ping_River', 'Platte_River', 'Po_River', 'Potomac_River', 'Purus_River', 'Rappahannock_River', 'Rhine_River', 'Rhone_River', 'Russian_River', 'Saale_River', 'Sabine_River', 'Sacramento_River', 'Saint_John_River', 'St._John_River', 'Saint_Johns_River', 'St._Johns_River', 'Saint_Lawrence_River', 'St._Lawrence_River', 'Sambre_River', 'San_Joaquin_River', 'Saone_River', 'Savannah_River', 'Scheldt_River', 'Seine_River', 'River_Severn', 'Severn_River', 'Severn_River', 'Seyhan_River', 'Shari_River', 'Chari_River', 'Shenandoah_River', 'River_Styx', 'Sun_River', 'Suriname_River', 'Surinam_River', 'Susquehanna_River', 'Tagus_River', 'Tallapoosa_River', 'Tennessee_River', 'River_Thames', 'Thames_River', 'Tigris_River', 'Tocantins_River', 'Tombigbee_River', 'River_Trent', 'Trent_River', 'Trinity_River', 'River_Tyne', 'Tyne_River', 'Uruguay_River', 'Vetluga_River', 'Vistula_River', 'Volga_River', 'Volkhov_River', 'Weser_River', 'Willamette_River', 'Yalu_River', 'Yangtze_River', 'Yazoo_River', 'Yenisei_River', 'Yenisey_River', 'Yukon_River', 'Zambezi_River', 'Canton_River', 'Pearl_River']
Much better but i think you're better off just crawling the names from http://en.wikipedia.org/wiki/Lists_of_rivers . Have fun!
To show that the solution for "river" using NLTK wordnet won't scale other entities, and also answer #tripleee's question.
If you're looking for animals, you can simply recursively get all hyponyms of animals, as such:
list(set([w for s in vehicle.closure(lambda s:s.hyponyms()) for w in s.lemma_names]))
We can use sentence tagging to get proper nouns. Filter them for required output.
Perhaps:
from nltk.tag import pos_tag
sentence = "Amazon is great river. Mississippi is awesome too."
tagged_sent = pos_tag(sentence.split())
tagged_sent will yield similar tagged out where NNP is the proper noun.
[('Amazon', 'NNP'), ('is', 'VBZ'), ('great', 'JJ'), ('river.', 'NNP'), ('Mississippi', 'NNP'), ('is', 'VBZ'), ('awesome', 'VBN'), ('too.', '-NONE-')]
propernouns = [word for word,pos in tagged_sent if pos == 'NNP']
propernouns would return
['Amazon', 'river.', 'Mississippi']
You can set categories for each and use a function to return them.
If you want to get lists of things of a specific type e.g. rivers http://dbPedia.org, http://freebase.com or http://wikidata.org are the better choice.
This dbPedia SPARQL query returns all rivers known to Wikipedia:
SELECT ?name ?description WHERE {
{?river rdf:type dbpedia-owl:River} .
?river foaf:name ?name .
?river rdfs:comment ?description .
}
ORDER BY ?name
http://bit.ly/1jc8Ip6

Categories