Related
I am trying to sort this multidimensional array after the number on the first index using the merge sort algorithm, but I am very unsure on how to do so.
This is the multidimensional array I am trying to sort:
Orders_db = [[1347517405, 54413, '78'], [1347517413, 54421, '86'], [1347517454, 54462, '127'], [1347517460, 54468, '133'], [1347517461, 54469, '134'], [1347517426, 54434, '99'], [1347517464, 54472, '137'], [1347517394, 54402, '67'], [1347517445, 54453, '118'], [1347517375, 54383, '48'], [1347517377, 54385, '50'], [1347517392, 54400, '65'], [1347517450, 54458, '123'], [1347517404, 54412, '77'], [1347517389, 54397, '62'], [1347517393, 54401, '66'], [1347517440, 54448, '113'], [1347517457, 54465, '130'], [1347517444, 54452, '117'], [1347517400, 54408, '73'], [1347517412, 54420, '85'], [1347517371, 54379, '44'], [1347517415, 54423, '88'], [1347517441, 54449, '114'], [1347517435, 54443, '108'], [1347517409, 54417, '82'], [1347517398, 54406, '71'], [1347517422, 54430, '95'], [1347517468, 54476, '141'], [1347517402, 54410, '75'], [1347517437, 54445, '110'], [1347517446, 54454, '119'], [1347517382, 54390, '55'], [1347517399, 54407, '72'], [1347517438, 54446, '111'], [1347517416, 54424, '89'], [1347517380, 54388, '53'], [1347517425, 54433, '98'], [1347517406, 54414, '79'], [1347517449, 54457, '122'], [1347517388, 54396, '61'], [1347517430, 54438, '103'], [1347517455, 54463, '128'], [1347517458, 54466, '131'], [1347517452, 54460, '125'], [1347517396, 54404, '69'], [1347517423, 54431, '96'], [1347517465, 54473, '138'], [1347517397, 54405, '70'], [1347517459, 54467, '132'], [1347517395, 54403, '68'], [1347517381, 54389, '54'], [1347517424, 54432, '97'], [1347517436, 54444, '109'], [1347517434, 54442, '107'], [1347517401, 54409, '74'], [1347517376, 54384, '49'], [1347517467, 54475, '140'], [1347517456, 54464, '129'], [1347517427, 54435, '100'], [1347517383, 54391, '56'], [1347517451, 54459, '124'], [1347517433, 54441, '106'], [1347517414, 54422, '87'], [1347517417, 54425, '90'], [1347517453, 54461, '126'], [1347517378, 54386, '51'], [1347517432, 54440, '105'], [1347517403, 54411, '76'], [1347517439, 54447, '112'], [1347517448, 54456, '121'], [1347517410, 54418, '83'], [1347517391, 54399, '64'], [1347517447, 54455, '120'], [1347517421, 54429, '94'], [1347517379, 54387, '52'], [1347517411, 54419, '84'], [1347517386, 54394, '59'], [1347517384, 54392, '57'], [1347517374, 54382, '47'], [1347517462, 54470, '135'], [1347517431, 54439, '104'], [1347517419, 54427, '92'], [1347517428, 54436, '101'], [1347517466, 54474, '139'], [1347517443, 54451, '116'], [1347517463, 54471, '136'], [1347517385, 54393, '58'], [1347517387, 54395, '60'], [1347517373, 54381, '46'], [1347517372, 54380, '45'], [1347517418, 54426, '91'], [1347517420, 54428, '93'], [1347517469, 54477, '142]'], [1347517442, 54450, '115'], [1347517408, 54416, '81'], [1347517390, 54398, '63'], [1347517407, 54415, '80'], [1347517429, 54437, '102']]
And I can implement a general merge sort algorithm, but i cannot do it in a way where I sort after the number in the array on the first index.
My implementation of merge sort is:
def merge_sort(arr):
if len(arr) > 1:
mid = len(arr) // 2
left = arr[:mid]
right = arr[mid:]
merge_sort(left)
merge_sort(right)
i = j = k = 0
while i < len(left) and j < len(right):
if left[i] < right[j]:
arr[k] = left[i]
i += 1
else:
arr[k] = right[j]
j += 1
k += 1
while i < len(left):
arr[k] = left[i]
i += 1
k += 1
while j < len(right):
arr[k] = right[j]
j += 1
k += 1
return arr
How can I fit the general merge sort algorithm to sort this array?
The preferred answer is that it returns the array with the highest number in the end.
Try below:
def merge(left, right):
if not len(left) or not len(right):
return left or right
result = []
i, j = 0, 0
while (len(result) < len(left) + len(right)):
if left[i][0] < right[j][0]:
result.append(left[i])
i+= 1
else:
result.append(right[j])
j+= 1
if i == len(left) or j == len(right):
result.extend(left[i:] or right[j:])
break
return result
def mergesort(list):
if len(list) < 2:
return list
middle = len(list)/2
left = mergesort(list[:middle])
right = mergesort(list[middle:])
return merge(left, right)
seq = [[1347517405, 54413, '78'], [1347517413, 54421, '86'], [1347517454, 54462, '127'], [1347517460, 54468, '133'], [1347517461, 54469, '134'], [1347517426, 54434, '99'], [1347517464, 54472, '137'], [1347517394, 54402, '67'], [1347517445, 54453, '118'], [1347517375, 54383, '48'], [1347517377, 54385, '50'], [1347517392, 54400, '65'], [1347517450, 54458, '123'], [1347517404, 54412, '77'], [1347517389, 54397, '62'], [1347517393, 54401, '66'], [1347517440, 54448, '113'], [1347517457, 54465, '130'], [1347517444, 54452, '117'], [1347517400, 54408, '73'], [1347517412, 54420, '85'], [1347517371, 54379, '44'], [1347517415, 54423, '88'], [1347517441, 54449, '114'], [1347517435, 54443, '108'], [1347517409, 54417, '82'], [1347517398, 54406, '71'], [1347517422, 54430, '95'], [1347517468, 54476, '141'], [1347517402, 54410, '75'], [1347517437, 54445, '110'], [1347517446, 54454, '119'], [1347517382, 54390, '55'], [1347517399, 54407, '72'], [1347517438, 54446, '111'], [1347517416, 54424, '89'], [1347517380, 54388, '53'], [1347517425, 54433, '98'], [1347517406, 54414, '79'], [1347517449, 54457, '122'], [1347517388, 54396, '61'], [1347517430, 54438, '103'], [1347517455, 54463, '128'], [1347517458, 54466, '131'], [1347517452, 54460, '125'], [1347517396, 54404, '69'], [1347517423, 54431, '96'], [1347517465, 54473, '138'], [1347517397, 54405, '70'], [1347517459, 54467, '132'], [1347517395, 54403, '68'], [1347517381, 54389, '54'], [1347517424, 54432, '97'], [1347517436, 54444, '109'], [1347517434, 54442, '107'], [1347517401, 54409, '74'], [1347517376, 54384, '49'], [1347517467, 54475, '140'], [1347517456, 54464, '129'], [1347517427, 54435, '100'], [1347517383, 54391, '56'], [1347517451, 54459, '124'], [1347517433, 54441, '106'], [1347517414, 54422, '87'], [1347517417, 54425, '90'], [1347517453, 54461, '126'], [1347517378, 54386, '51'], [1347517432, 54440, '105'], [1347517403, 54411, '76'], [1347517439, 54447, '112'], [1347517448, 54456, '121'], [1347517410, 54418, '83'], [1347517391, 54399, '64'], [1347517447, 54455, '120'], [1347517421, 54429, '94'], [1347517379, 54387, '52'], [1347517411, 54419, '84'], [1347517386, 54394, '59'], [1347517384, 54392, '57'], [1347517374, 54382, '47'], [1347517462, 54470, '135'], [1347517431, 54439, '104'], [1347517419, 54427, '92'], [1347517428, 54436, '101'], [1347517466, 54474, '139'], [1347517443, 54451, '116'], [1347517463, 54471, '136'], [1347517385, 54393, '58'], [1347517387, 54395, '60'], [1347517373, 54381, '46'], [1347517372, 54380, '45'], [1347517418, 54426, '91'], [1347517420, 54428, '93'], [1347517469, 54477, '142]'], [1347517442, 54450, '115'], [1347517408, 54416, '81'], [1347517390, 54398, '63'], [1347517407, 54415, '80'], [1347517429, 54437, '102']]
print("Given array is")
print(seq);
print("\n")
print("Sorted array is")
print(mergesort(seq))
If you chage left[i][0] < right[j][0] to left[i][1] < right[j][1] then it will sort accrding to the second element in the inner array.
so I'm working on this referencing problem in which I need to get a key from a dictionary and its child keys(if any)
for example, I have a list of dictionary keys as such:
dict_keys(['1.', '1.1', '1.2', '1.2.1', '1.2.2', '1.2.2(a)', '1.2.2(b)', '1.2.2(c)', '1.2.2(d)', '1.2.3', '1.2.4', '1.2.5', '1.2.6', '1.2.7', '1.2.8', '1.2.9', '1.2.10', '1.2.11', '2.', '2.1', '3.', '3.1', '3.1.1', '3.1.2', '3.2', '3.3', '3.4', '3.5', '3.5.1', '3.5.2', '3.5.2(a)', '3.5.2(b)', '3.5.2(c)', '3.6', '3.7', '3.8', '3.9', '3.9.1', '3.9.2', '3.9.3', '3.10', '3.11', '3.11.1', '3.11.2', '4.', '4.1', '4.1.1', '4.1.2', '4.1.3', '4.1.4', '4.1.5', '4.1.6', '4.1.7', '4.1.8', '4.1.9', '4.1.10', '4.2', '5.', '5.1', '5.1.1', '5.1.2', '5.1.3', '5.1.4', '5.1.4(a)', '5.1.4(b)', '5.1.4(c)', '5.1.4(d)', '5.1.4(e)', '5.1.4(f)', '5.1.4(g)', '5.1.4(h)', '5.1.4(i)', '5.1.4(j)', '5.1.4(k)', '5.2', '5.3', '5.4', '6.', '6.1', '6.2', '7.', '7.1', '7.2', '7.3', '7.3.1', '7.3.2', '7.3.3', '7.4', '7.5', '7.6', '7.6.1', '7.6.2', '7.6.3', '7.6.4', '8.', '8.1', '8.2', '8.2.1', '8.2.2', '8.2.3', '8.2.4', '8.2.5', '8.2.6', '8.2.7', '8.3', '8.3.1', '8.3.2', '8.3.3', '8.3.4', '8.3.5', '8.3.6', '8.4', '9.', '9.1', '9.1.1', '9.1.2', '9.2', '9.3', '9.4', '9.5', '10.', '10.1', '10.2', '10.3', '10.4', '11.', '11.1', '11.2', '11.2.1', '11.2.2', '11.2.3', '11.3', '11.4', '11.5', '11.6', '11.7', '11.8', '11.9', '11.10', '12.', '12.1', '12.2', '12.2.1', '12.2.2', '12.3', '12.4', '12.4.1', '12.4.2', '12.4.3', '12.5', '12.6', '12.7', '12.8', '12.9', '12.10', '12.11', '13.', '13.1', '13.2', '13.3', '13.3.1', '13.3.2', '13.3.3', '13.3.4', '13.4', '13.4.1', '13.4.2', '13.4.3', '14.', '14.1', '14.2', '14.3', '15.', '15.1', '15.2', '15.2.1', '15.2.2', '15.2.3', '15.2.4', '16.', '16.1', '16.2', '16.3', '16.4', '16.5', '16.6', '16.7', '16.7.1', '16.7.2', '17.', '17.1', '17.1.1', '17.1.2', '17.1.3', '17.1.4', '17.2', '17.2.1', '17.2.2', '17.2.3', '17.2.4', '17.3', '17.4', '17.4.1', '17.4.2', '17.5', '17.5.1', '17.5.2', '17.5.3', '18.', '18.1', '18.2', '18.2.1', '18.2.1(a)', '18.2.1(b)', '18.2.1(c)', '18.2.2', '18.2.3', '18.2.4', '18.2.5', '18.2.5(a)', '18.2.5(b)', '18.2.6', '18.2.7', '19.', '19.1', '19.2', '19.3', '19.4', '20.', '21.', '21.1', '21.2', '21.3', '21.4', '22.', '22.1', '22.1.1', '22.1.2', '22.1.3', '22.1.4', '22.2', '22.2.1', '22.2.2', '22.3', '22.4', '22.5', '22.6', '22.6.1', '22.6.2', '22.7', '23.', '23.1', '23.1.1', '23.1.1(a)', '23.1.1(b)', '23.1.1(c)', '23.1.2', '23.1.3', '23.1.3(a)', '23.1.3(b)', '23.1.4', '23.2', '23.2.1', '23.2.2', '23.3', '23.4', '23.5', '24.', '24.1', '24.1.1', '24.1.2', '24.1.3', '24.1.4', '24.1.5', '24.1.6', '24.2', '24.3', '25.', '25.1', '25.1.1', '25.1.2', '25.1.3', '25.1.4', '25.1.5', '25.1.6', '25.2', '25.2.1', '25.2.2', '25.2.3', '25.2.4', '25.2.5', '25.3', '26.', '27.', '27.1', '27.2', '27.2.1', '27.2.2', '27.2.3', '27.3', '28.', '28.1', '28.2', '28.2.1', '28.2.2', '28.2.3', '28.2.4', '28.2.5', '28.3', '29.', '29.1', '29.1.1', '29.1.2', '29.1.3', '29.2', '30.', '30.1', '30.2', '30.2.1', '30.2.2', '30.2.3', '30.3', '30.4', '31.', '31.1', '31.2', '31.3', '31.4', '31.5', '31.5.1', '31.5.2', '31.6', '31.7', '31.8', '31.8.1', '31.8.2', '31.8.3', '31.9', '31.10', '31.11'])
in this, if I have something referred in key 3.1, I also want to extract the data for keys 3.1.1 and 3.1.2(child keys). so for this I used str.startwith() method which fails in cases as such.
It will not only return me the child clauses but also other clauses like 3.10, 3.11... which also start with 3.1, so this will result in some false positives and leakages.
The caveat however being that I don't know for sure that what numbering system will the user be using in the agreement.
for example:
it could be anything like:
['(1.)', '(1.)a.', '(2.)'] or ['1.', '1.1.', '1.1.1.', '1.1.1.a',] or ['1.', '1.1', '1.1.1', '1.1.1a']
so I'm trying to figure out a function using which I can cover all these ways.
How can I do that?
Thanks in advance! :)
You're much more familiar with your dataset and how robust you need to be, but I set that set of dict_keys to keys and used the following:
if key.count('.') < 2:
child_keys = [k for k in keys if k.startswith(key + '.')]
else:
child_keys = [k for k in keys if k.startswith(key + '(')]
If key = '1.2.2', then child_keys = ['1.2.2(a)', '1.2.2(b)', '1.2.2(c)', '1.2.2(d)']. If key = '3.1', then child_keys = ['3.1.1', '3.1.2'].
def get_subsequent_keys(key):
subsequent_keys = []
total_keys = structure.keys()
try:
if key[-1] == '.' or key[-1] == ')':
if key in total_keys:
pass
elif key[:-1] in total_keys:
subsequent_keys.append(key[:-1])
subsequent_keys += [x for x in total_keys if x.startswith(key)]
else:
key_pat = fr"({key}[a-z]+)"
subsequent_keys.append(key)
subsequent_keys += [x for x in total_keys if x.startswith(key + '.')]
subsequent_keys += [x for x in total_keys if x.startswith(key + '(')]
subsequent_keys += re.findall(key_pat, ' '.join(total_keys))
except Exception as exception:
logger.debug(exception)
finally:
return subsequent_keys
So I used this way to cover all the possible cases.
The problem with my use case is that I don't really have a definite way by which the user numbers the clauses in the agreement. Hence I needed a generic function to cover all ground. Also, thanks to Ava and Hayley for their help! :D
im fairly new to python and im not sure how to tackle my problem
im trying to make a program that can take a string of 15 characters from a .txt file and find words that you can make from those characters with a dictionary file, than output those words to another text file.
this is what i have tried:
attempting to find words that don't contain the characters and removing them from the list
various anagram solver type programs of git hub
i tried this sudo pip3 install anagram-solverbut it has been 3 hours on 15 characters and it is still running
im new so please tell me if im forgetting something
If you're looking for "perfect" anagrams, i.e. those that contain exactly the same number of characters, not a subset, it's pretty easy:
take your word-to-find, sort it by its letters
take your dictionary, sort each word by its letters
if the sorted versions match, they're anagrams
def find_anagrams(seek_word):
sorted_seek_word = sorted(seek_word.lower())
for word in open("/usr/share/dict/words"):
word = word.strip() # remove trailing newline
sorted_word = sorted(word.lower())
if sorted_word == sorted_seek_word and word != seek_word:
print(seek_word, word)
if __name__ == "__main__":
find_anagrams("begin")
find_anagrams("nicer")
find_anagrams("decor")
prints (on my macOS machine – Windows machines won't have /usr/share/dict/words by default, and some Linux distributions need it installed separately)
begin being
begin binge
nicer cerin
nicer crine
decor coder
decor cored
decor Credo
EDIT
A second variation that finds all words that are assemblable from the letters in the original word, using collections.Counter:
import collections
def find_all_anagrams(seek_word):
seek_word_counter = collections.Counter(seek_word.lower())
for word in open("/usr/share/dict/words"):
word = word.strip() # remove trailing newline
word_counter = collections.Counter(word.strip())
if word != seek_word and all(
n <= seek_word_counter[l] for l, n in word_counter.items()
):
yield word
if __name__ == "__main__":
print("decoration", set(find_all_anagrams("decoration")))
Outputs e.g.
decoration {'carte', 'drona', 'roit', 'oat', 'cantred', 'rond', 'rid', 'centroid', 'trine', 't', 'tenai', 'cond', 'toroid', 'recon', 'contra', 'dain', 'cootie', 'iao', 'arctoid', 'oner', 'indart', 'tine', 'nace', 'rident', 'cerotin', 'cran', 'eta', 'eoan', 'cardoon', 'tone', 'trend', 'trinode', 'coaid', 'ranid', 'rein', 'end', 'actine', 'ide', 'cero', 'iodate', 'corn', 'oer', 'retia', 'nidor', 'diter', 'drat', 'tec', 'tic', 'creat', 'arent', 'coon', 'doater', 'ornoite', 'terna', 'docent', 'tined', 'edit', 'octroi', 'eric', 'read', 'toned', 'c', 'tera', 'can', 'rocta', 'cortina', 'adonite', 'iced', 'no', 'natr', 'net', 'oe', 'rodeo', 'actor', 'otarine', 'on', 'cretin', 'ericad', 'dance', 'tornade', 'tinea', 'coontie', 'anerotic', 'acrite', 'ra', 'danio', 'inroad', 'inde', 'tied', 'tar', 'coronae', 'tid', 'rad', 'doc', 'derat', 'tea', 'acerin', 'ronde', 'recti', 'areito', 'drain', 'odontic', 'octoad', 'rio', 'actin', 'tread', 'rect', 'ariot', 'road', 'doctrine', 'enactor', 'indoor', 'toco', 'ton', 'trice', 'norite', 'nea', 'coda', 'noria', 'rot', 'trona', 'rice', 'arite', 'eria', 'orad', 'rate', 'toed', 'enact', 'crinet', 'cento', 'arid', 'coot', 'nat', 'nar', 'cain', 'at', 'antired', 'ear', 'triode', 'doter', 'cedarn', 'orna', 'rand', 'tari', 'crea', 'tiar', 'retan', 'tire', 'cora', 'aroid', 'iron', 'tenio', 'enroot', 'd', 'oaric', 'acetin', 'tain', 'neat', 'noter', 'tien', 'aortic', 'tode', 'dicer', 'irate', 'tie', 'canid', 'ado', 'noticer', 'arn', 'nacre', 'ceration', 'ratine', 'denaro', 'cotoin', 'aint', 'canto', 'cinter', 'decani', 'roon', 'donor', 'acnode', 'aide', 'doer', 'tacnode', 'oread', 'acetoin', 'rine', 'acton', 'conoid', 'a', 'otocrane', 'norate', 'care', 'ticer', 'io', 'detain', 'cedar', 'ta', 'toadier', 'atone', 'cornet', 'dacoit', 'toric', 'orate', 'arni', 'adroit', 'rend', 'tanier', 'rooted', 'doit', 'dier', 'odorate', 'trica', 'rated', 'cotonier', 'dine', 'roid', 'cairned', 'cat', 'i', 'coin', 'octine', 'trod', 'orc', 'cardo', 'eniac', 'arenoid', 'erd', 'creant', 'oda', 'ratio', 'ceria', 'ad', 'acorn', 'dorn', 'deric', 'credit', 'door', 'cinder', 'cantor', 'er', 'doon', 'coner', 'donate', 'roe', 'tora', 'antic', 'racoon', 'ooid', 'noa', 'tae', 'coroa', 'earn', 'retain', 'canted', 'norie', 'rota', 'tao', 'redan', 'rondo', 'entia', 'ctenoid', 'cent', 'daroo', 'inrooted', 'roed', 'adore', 'coat', 'e', 'rat', 'deair', 'arend', 'coir', 'acid', 'coronate', 'rodent', 'acider', 'iota', 'codo', 'redaction', 'cot', 'aeric', 'tonic', 'candier', 'decart', 'dicta', 'dot', 'recoat', 'caroon', 'rone', 'tarie', 'tarin', 'teca', 'oar', 'ocrea', 'ante', 'creation', 'tore', 'conto', 'tairn', 'roc', 'conter', 'coeditor', 'certain', 'roncet', 'decator', 'not', 'coatie', 'toran', 'caid', 'redia', 'root', 'cad', 'cartoon', 'n', 'coed', 'cand', 'neo', 'coronadite', 'dare', 'dartoic', 'acoin', 'detar', 'dite', 'trade', 'train', 'ordinate', 'racon', 'citron', 'dan', 'doat', 'nito', 'tercia', 'rote', 'cooer', 'acone', 'rita', 'caret', 'dern', 'enatic', 'too', 'cried', 'tade', 'dit', 'orient', 'ria', 'torn', 'coati', 'cnida', 'note', 'tried', 'acrid', 'nitro', 'acron', 'tern', 'one', 'it', 'naio', 'dor', 'ea', 'ca', 'ire', 'inert', 'orcanet', 'cine', 'coe', 'nardoo', 'deota', 'den', 'toi', 'adion', 'to', 'rite', 'nectar', 'rane', 'riant', 'cod', 'de', 'adit', 'airt', 'ie', 'retin', 'toon', 'cane', 'aeon', 'are', 'cointer', 'actioner', 'crin', 'detrain', 'art', 'cant', 'ort', 'tored', 'antoeci', 'tier', 'cite', 'onto', 'coater', 'tranced', 'atonic', 'roi', 'in', 'roan', 'decoat', 'rain', 'cronet', 'ronco', 'dont', 'citer', 'redact', 'cider', 'nor', 'octan', 'ration', 'doina', 'rie', 'aero', 'noted', 'crate', 'crain', 'cadet', 'condite', 'ran', 'odeon', 'date', 'eat', 'intoed', 'cation', 'carone', 'ratoon', 'retina', 'tiao', 'nice', 'nodi', 'codon', 'coo', 'torc', 'dent', 'entad', 'ne', 'toe', 'dae', 'decant', 'redcoat', 'coiner', 'irade', 'air', 'oint', 'coronet', 'radon', 'ce', 'octonare', 'oaten', 'citrean', 'dice', 'dancer', 'carotid', 'cretion', 'don', 'cion', 'nei', 'tead', 'nori', 'nacrite', 'ootid', 'rancid', 'dornic', 'orenda', 'cairn', 'aroon', 'coardent', 'aider', 'notice', 'cored', 'adorn', 'tad', 'carid', 'otic', 'dian', 'od', 'dint', 'tercio', 'die', 'conred', 'tice', 'rant', 'candor', 'anti', 'dar', 'antre', 'cornea', 'ordain', 'corona', 'recta', 'redo', 'tare', 'coranto', 'action', 'caird', 'creta', 'naid', 'tri', 'acre', 'crane', 'coated', 'citronade', 'anoetic', 'tenor', 'anode', 'triad', 'ceratoid', 'rod', 'idea', 'carton', 'cortin', 'endaortic', 'dicot', 'tend', 'da', 'tod', 'erotica', 'cord', 'coreid', 'toader', 'dace', 'tan', 'editor', 'rection', 'toner', 'cone', 'ni', 'tide', 'coder', 'din', 'ocote', 'ore', 'daer', 'octane', 'darn', 'do', 'reit', 'na', 'catenoid', 'tron', 'condor', 'crinated', 'cordon', 'crone', 'toad', 'noir', 'into', 'tirade', 'nadir', 'ant', 'ade', 'droit', 'icon', 'drone', 'ared', 'cardin', 'nid', 'dire', 'orcin', 'donator', 'rani', 'tane', 'ace', 'iodo', 'doria', 'ride', 'eon', 'ornate', 'cedrat', 'aire', 'carotin', 'dation', 'tear', 'onca', 'cote', 'taroc', 'con', 'nod', 'dinero', 'ecad', 'recant', 'ae', 'octad', 'cor', 'doctor', 'acridone', 'neti', 'cordite', 'crotin', 'aneroid', 'diota', 'coorie', 'dita', 'aconite', 'nard', 'cadent', 'ectad', 'rance', 'rea', 'tai', 'denat', 'rood', 'acne', 'decan', 'ani', 'rit', 'cit', 'cetin', 'odor', 'acorned', 'iceroot', 'inro', 'crood', 'daric', 'dacite', 'trone', 'acier', 'reina', 'oncia', 'drant', 'acrodont', 'nacred', 'cotrine', 'dinar', 'tean', 'atoner', 'toorie', 'nadorite', 'cardon', 'taen', 'tin', 'conte', 'acoine', 'dater', 'diact', 'aid', 'anodic', 'coronated', 'direct', 're', 'era', 'anticor', 'triace', 'octoid', 'dao', 'corta', 'edict', 'trode', 'ode', 'orant', 'niter', 'centrad', 'cater', 'tronc', 'coronad', 'r', 'toro', 'ar', 'once', 'ora', 'trace', 'creodont', 'erotic', 'ai', 'troca', 'ion', 'tecon', 'tra', 'acor', 'radio', 'acred', 'croon', 'tricae', 'recto', 'riden', 'andorite', 'taro', 'red', 'dear', 'ate', 'tinder', 'trin', 'deacon', 'ardent', 'aer', 'arc', 'crine', 'dart', 'diet', 'riot', 'tanrec', 'tor', 'noetic', 'ret', 'trance', 'ona', 'rind', 'coto', 'daoine', 'teind', 'toa', 'inter', 'code', 'cart', 'aion', 'detin', 'core', 'oont', 'rent', 'cedrin', 'card', 'trained', 'o', 'recoin', 'cro', 'and', 'diner', 'id', 'cordant', 'cedron', 'ditone', 'odic', 'cadi', 'cerin', 'nit', 'ecoid', 'nide', 'ean', 'andric', 'tind', 'raid', 'crena', 'oroide', 'roadite', 'canter', 'idant', 'cade', 'race', 'ten', 'caner', 'tarn', 'cooter', 'etna', 'tornadic', 'irone', 'ice', 'en', 'oord', 'oared', 'draine', 'cordate', 'react', 'reaction', 'tornado', 'troco', 'niota', 'carotenoid', 'an', 'cader', 'naric', 'car', 'centiar', 'ti', 'cearin', 'aroint', 'crined', 'iter', 'di', 'or', 'trio', 'dari', 'oration', 'orcein', 'coned', 'odorant', 'dean', 'coadore', 'cate', 'drate', 'dirten', 'ted', 'done', 'cadre', 'ocean', 'tired', 'adet', 'dirt', 'te', 'nae', 'ceti', 'cern', 'rotan', 'doe', 'roto', 'dote', 'node', 'ait', 'act', 'canoe', 'rode'}
I'm trying to make poker game in Python. In the while fuction I want to move the used cards in a separate(used cards) list. The problem is sometimes when I print the hand I can get duplicates. Something is wrong with my sorting strategy and I don't know what. Can you help me?
import random
deck = ['AS', 'KS', 'QS', 'JS', '10S', '9S', '8S', '7S', '6S', '5S', '4S', '3S', '2S',\
'AD', 'KD', 'QD', 'JD', '10D', '9D', '8D', '7D', '6D', '5D', '4D', '3D', '2D',\
'AC', 'KC', 'QC', 'JC', '10C', '9C', '8C', '7C', '6C', '5C', '4C', '3C', '2C',\
'AH', 'KH', 'QH', 'JH', '10H', '9H', '8H', '7H', '6H', '5H', '4H', '3H', '2H']
used = []
p1 = []
p2 = []
a = 0
while (a < 2):
drawn_card = random.choice(deck)
deck.append(drawn_card)
deck = [f for f in deck if f not in used]
p1.append(drawn_card)
a+=1
Well the random choice is not guaranteed to be unique, thus when you do:
drawn_card = random.choice(deck)
..
p1.append(drawn_card)
you may end up having duplicates (that explains that you some time see duplicates and some not).
Check if drawn_card is in the list first and if not, then append. That way you won't have duplicates. In code you could it like this:
if drawn_card not in p1:
p1.append(drawn_card)
Or, as Rory Daulton said:
If you are allowed, you could shuffle the entire deck, then remove consecutive items from that list.
you need to compare the random card with "p1" not with "deck":
import random
deck = ['AS', 'KS', 'QS', 'JS', '10S', '9S', '8S', '7S', '6S', '5S', '4S', '3S', '2S',\
'AD', 'KD', 'QD', 'JD', '10D', '9D', '8D', '7D', '6D', '5D', '4D', '3D', '2D',\
'AC', 'KC', 'QC', 'JC', '10C', '9C', '8C', '7C', '6C', '5C', '4C', '3C', '2C',\
'AH', 'KH', 'QH', 'JH', '10H', '9H', '8H', '7H', '6H', '5H', '4H', '3H', '2H']
used = []
p1 = []
a = 0
while (a < 2):
drawn_card = random.choice(deck)
print(drawn_card)
if drawn_card not in p1:
p1.append(drawn_card)
a += 1
continue
print (p1)
I would like to get lists of things, for example a list of names of rivers or a list of types of animals.
NLTK looks like it might be the thing for this, but I'm not sure how to do what I want. I'd like to have a function like:
get_list_of("river")
that would return something like
["amazon", "mississippi", "thames", ...]
I would suggest looking at NLTK wordnet API, see http://www.nltk.org/howto/wordnet.html.
But after doing some digging seems like Proper Nouns (i.e. names of river are not easy to track down in wordnet)
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('river')
[Synset('river.n.01')]
>>> wn.synset('river.n.01')
Synset('river.n.01')
>>> wn.synset('river.n.01').lemma_names
['river']
>>> wn.synsets('amazon')
[Synset('amazon.n.01'), Synset('amazon.n.02'), Synset('amazon.n.03'), Synset('amazon.n.04')]
>>> wn.synset('amazon.n.01').definition
'a large strong and aggressive woman'
>>> wn.synset('amazon.n.02').definition
'(Greek mythology) one of a nation of women warriors of Scythia (who burned off the right breast in order to use a bow and arrow more effectively)'
>>> wn.synset('amazon.n.03').definition
"a major South American river; arises in the Andes and flows eastward into the South Atlantic; the world's 2nd longest river (4000 miles)"
>>> wn.synset('amazon.n.04').definition
'mainly green tropical American parrots'
As a brute force way, look for "river" in a synsets' definitions, as such:
from itertools import chain
list(chain(*[i.lemma_names for i in wn.all_synsets() if "river" in i.definition]))
[out]:
['anaclinal', 'cataclinal', 'Acheronian', 'Acherontic', 'Stygian', 'hit-and-run', 'fluvial', 'riparian', 'Lao', 'debouch', 'rejuvenate', 'drive', 'ford', 'ascend', 'plant', 'drive', 'ford', 'fording', 'drive', 'driving', 'flood_control', 'conservancy', 'road_rage', 'Aegospotami', 'Aegospotamos', 'Yalu_River', 'three-spined_stickleback', 'Gasterosteus_aculeatus', 'ten-spined_stickleback', 'Gasterosteus_pungitius', 'placoderm', 'hellbender', 'mud_puppy', 'Cryptobranchus_alleganiensis', 'plains_spadefoot', 'Scaphiopus_bombifrons', 'mud_turtle', 'cooter', 'river_cooter', 'Pseudemys_concinna', 'spiny_softshell', 'Trionyx_spiniferus', 'smooth_softshell', 'Trionyx_muticus', 'teal', 'pintail', 'pin-tailed_duck', 'Anas_acuta', 'Ancylus', 'genus_Ancylus', 'freshwater_mussel', 'freshwater_clam', 'long-clawed_prawn', 'river_prawn', 'Palaemon_australis', 'Platanistidae', 'family_Platanistidae', 'hippopotamus', 'hippo', 'river_horse', 'Hippopotamus_amphibius', 'waterbuck', 'Australian_lungfish', 'Queensland_lungfish', 'Neoceratodus_forsteri', 'alewife', 'Alosa_pseudoharengus', 'Pomolobus_pseudoharengus', 'sockeye', 'sockeye_salmon', 'red_salmon', 'blueback_salmon', 'Oncorhynchus_nerka', 'brown_trout', 'salmon_trout', 'Salmo_trutta', 'Australian_arowana', 'Dawson_River_salmon', 'saratoga', 'spotted_barramundi', 'spotted_bonytongue', 'Scleropages_leichardti', 'Australian_bonytongue', 'northern_barramundi', 'Scleropages_jardinii', 'crappie', 'striped_bass', 'striper', 'Roccus_saxatilis', 'rockfish', 'bolti', 'Tilapia_nilotica', 'Chinese_paddlefish', 'Psephurus_gladis', 'air_bag', 'Augean_stables', 'barouche', 'bend', 'curve', 'boathouse', 'box', 'box_seat', 'brassie', 'bridge', 'span', 'bridle', 'brougham', 'buggy_whip', 'cab', 'car_mirror', 'coach', 'four-in-hand', 'coach-and-four', 'cockpit', 'death_seat', 'dredge', 'dredging_bucket', 'elbow', 'flat_tip_screwdriver', 'hansom', 'hansom_cab', 'keelboat', 'Lake_Volta', 'levee', 'levee', 'L-plate', 'machine_screw', 'outfall', 'Phillips_screwdriver', 'pull-in', 'pull-up', 'river_boat', 'showboat', 'skidpan', 'spiral_ratchet_screwdriver', 'ratchet_screwdriver', 'towpath', 'towing_path', 'truck_stop', 'willowware', 'willow-pattern', 'woodscrew', 'Copehan', 'Volgaic', 'horn', 'rip', 'riptide', 'tide_rip', 'crosscurrent', 'countercurrent', 'crappie', 'red_salmon', 'sockeye', 'sockeye_salmon', 'logjam', 'Teamsters_Union', 'car_pool', 'conservancy', 'headwater', 'river_basin', 'basin', 'watershed', 'drainage_basin', 'catchment_area', 'catchment_basin', 'drainage_area', 'confluence', 'meeting', 'Mammoth_Cave_National_Park', 'Zion_National_Park', 'watershed', 'water_parting', 'divide', 'Yangon', 'Rangoon', "N'Djamena", 'Ndjamena', 'Fort-Lamy', 'capital_of_Chad', 'Kinshasa', 'Leopoldville', 'Saxony', 'Sachsen', 'Saxe', 'Cologne', 'Koln', 'Mannheim', 'Rhineland', 'Rheinland', 'Ruhr', 'Ruhr_Valley', 'West_Bank', 'Pennines', 'Pennine_Chain', 'Ottawa', 'Canadian_capital', 'capital_of_Canada', 'Antwerpen', 'Antwerp', 'Anvers', 'Orleans', 'Rhone-Alpes', 'Friesland', 'Timbuktu', 'Bydgoszcz', 'Bromberg', 'Novosibirsk', 'Tbilisi', 'Tiflis', 'capital_of_Georgia', 'Toledo', 'Selma', 'Denver', 'Mile-High_City', 'capital_of_Colorado', 'Hartford', 'capital_of_Connecticut', 'Savannah', 'Topeka', 'capital_of_Kansas', 'Louisville', 'New_Orleans', 'Detroit', 'Motor_City', 'Motown', 'Minneapolis', 'Saint_Paul', 'St._Paul', 'capital_of_Minnesota', 'Jefferson_City', 'capital_of_Missouri', 'Saint_Louis', 'St._Louis', 'Gateway_to_the_West', 'Billings', 'Great_Falls', 'Omaha', 'Concord', 'capital_of_New_Hampshire', 'Manchester', 'Trenton', 'capital_of_New_Jersey', 'Albuquerque', 'New_Netherland', 'Albany', 'capital_of_New_York', 'Erie_Canal', 'New_York', 'New_York_City', 'Greater_New_York', 'West_Point', 'Niagara_Falls', 'Schenectady', 'Bismarck', 'capital_of_North_Dakota', 'Fargo', 'Cincinnati', 'Tulsa', 'Chester', 'Philadelphia', 'City_of_Brotherly_Love', 'Pierre', 'capital_of_South_Dakota', 'Mount_Vernon', 'Charleston', 'capital_of_West_Virginia', 'Huntington', 'Morgantown', 'Parkersburg', 'Wheeling', 'Casper', 'Ciudad_Bolivar', 'Aare', 'Aar', 'Aare_River', 'Acheron', 'River_Acheron', 'Adige', 'River_Adige', 'Aire', 'River_Aire', 'Aire_River', 'Alabama', 'Alabama_River', 'Allegheny', 'Allegheny_River', 'Amazon', 'Amazon_River', 'Amur', 'Amur_River', 'Heilong_Jiang', 'Heilong', 'Angara', 'Angara_River', 'Tunguska', 'Upper_Tunguska', 'Apalachicola', 'Apalachicola_River', 'Araguaia', 'Araguaia_River', 'Araguaya', 'Araguaya_River', 'Aras', 'Araxes', 'Arauca', 'Argun', 'Argun_River', 'Ergun_He', 'Arkansas', 'Arkansas_River', 'Arno', 'Arno_River', 'River_Arno', 'Avon', 'River_Avon', 'Upper_Avon', 'Upper_Avon_River', 'Avon', 'River_Avon', 'bar', 'Bighorn', 'Bighorn_River', 'Big_Sioux_River', 'billabong', 'bluff', 'body_of_water', 'water', 'bottomland', 'bottom', 'Brahmaputra', 'Brahmaputra_River', 'branch', 'Brazos', 'Brazos_River', 'brook', 'creek', 'Caloosahatchee', 'Caloosahatchee_River', 'Cam', 'River_Cam', 'Cam_River', 'Canadian', 'Canadian_River', 'canyon', 'canon', 'Cape_Fear_River', 'channel', 'Chao_Phraya', 'Charles', 'Charles_River', 'Chattahoochee', 'Chattahoochee_River', 'Cimarron', 'Cimarron_River', 'Clinch_River', 'Clyde', 'Cocytus', 'River_Cocytus', 'Colorado', 'Colorado_River', 'Colorado', 'Colorado_River', 'Columbia', 'Columbia_River', 'Congo', 'Congo_River', 'Zaire_River', 'Connecticut', 'Connecticut_River', 'Coosa', 'Coosa_River', 'Cumberland', 'Cumberland_River', 'dale', 'Danube', 'Danube_River', 'Danau', 'Darling', 'Darling_River', 'Delaware', 'Delaware_River', 'delta', 'Demerara', 'Detroit_River', 'distributary', 'Dnieper', 'Dnieper_River', 'Don', 'Don_River', 'Ebro', 'Ebro_River', 'Elbe', 'Elbe_River', 'Elizabeth_River', 'estuary', 'Euphrates', 'Euphrates_River', 'Flint', 'Flint_River', 'floodplain', 'flood_plain', 'Forth', 'Forth_River', 'Fox_River', 'Ganges', 'Ganges_River', 'Gan_Jiang', 'Kan_River', 'Garonne', 'Garonne_River', 'Gila', 'Gila_River', 'gorge', 'Grand_River', 'Green', 'Green_River', 'headstream', 'Housatonic', 'Housatonic_River', 'Huang_He', 'Hwang_Ho', 'Yellow_River', 'Hudson', 'Hudson_River', 'IJssel', 'IJssel_river', 'Illinois_River', 'Indigirka', 'Indigirka_River', 'Indus', 'Indus_River', 'Irrawaddy', 'Irrawaddy_River', 'Irtish', 'Irtish_River', 'Irtysh', 'Irtysh_River', 'Isere', 'Isere_River', 'James', 'James_River', 'James', 'James_River', 'Jordan', 'Jordan_River', 'Kansas', 'Kansas_River', 'Kaw_River', 'Kasai', 'Kasai_River', 'River_Kasai', 'Kissimmee', 'Kissimmee_River', 'Klamath', 'Klamath_River', 'Kura', 'Kura_River', 'Lake_Chad', 'Chad', 'Lehigh_River', 'Lena', 'Lena_River', 'Lethe', 'River_Lethe', 'liman', 'Limpopo', 'Crocodile_River', 'Little_Bighorn', 'Little_Bighorn_River', 'Little_Horn', 'Little_Missouri', 'Little_Missouri_River', 'Little_Sioux_River', 'Little_Wabash', 'Little_Wabash_River', 'Loire', 'Loire_River', 'Mackenzie', 'Mackenzie_River', 'Madeira', 'Madeira_River', 'Magdalena', 'Magdalena_River', 'meander', 'Mekong', 'Mekong_River', 'Merrimack', 'Merrimack_River', 'Meuse', 'Meuse_River', 'Milk', 'Milk_River', 'Mississippi', 'Mississippi_River', 'Missouri', 'Missouri_River', 'Mobile', 'Mobile_River', 'Mohawk_River', 'Monongahela', 'Monongahela_River', 'Moreau_River', 'Murray', 'Murray_River', 'Murrumbidgee', 'Murrumbidgee_River', 'Namoi', 'Namoi_River', 'Nan', 'Nan_River', 'Neckar', 'Neckar_River', 'Neosho', 'Neosho_River', 'Neva', 'Neva_River', 'New_River', 'Niagara', 'Niagara_River', 'Niger', 'Niger_River', 'Nile', 'Nile_River', 'North_Platte', 'North_Platte_River', 'Ob', 'Ob_River', 'Oder', 'Oder_River', 'Ohio', 'Ohio_River', 'Orange', 'Orange_River', 'Orinoco', 'Orinoco_River', 'Osage', 'Osage_River', 'Outaouais', 'Ottawa', 'Ottawa_river', 'Ouachita', 'Ouachita_River', 'Ouse', 'Ouse_River', 'oxbow', 'oxbow_lake', 'Parana', 'Parana_River', 'Parnaiba', 'Parnahiba', 'Pearl_River', 'Pee_Dee', 'Pee_Dee_River', 'Penobscot', 'Penobscot_River', 'Ping', 'Ping_River', 'Platte', 'Platte_River', 'Po', 'Po_River', 'Potomac', 'Potomac_River', 'Purus', 'Purus_River', 'rapid', 'Rappahannock', 'Rappahannock_River', 'Rhine', 'Rhine_River', 'Rhein', 'Rhone', 'Rhone_River', 'Rio_Grande', 'Rio_Bravo', 'riparian_forest', 'riverbank', 'riverside', 'riverbed', 'river_bottom', 'river_boulder', 'Russian_River', 'Saale', 'Saale_River', 'Sabine', 'Sabine_River', 'Sacramento_River', 'Saint_John', 'Saint_John_River', 'St._John', 'St._John_River', 'Saint_Johns', 'Saint_Johns_River', 'St._Johns', 'St._Johns_River', 'Saint_Lawrence', 'Saint_Lawrence_River', 'St._Lawrence', 'St._Lawrence_River', 'Sambre', 'Sambre_River', 'sandbank', 'San_Joaquin_River', 'Sao_Francisco', 'Saone', 'Saone_River', 'Savannah', 'Savannah_River', 'Scheldt', 'Scheldt_River', 'Seine', 'Seine_River', 'Severn', 'River_Severn', 'Severn_River', 'Severn', 'Severn_River', 'Seyhan', 'Seyhan_River', 'Shari', 'Shari_River', 'Chari', 'Chari_River', 'Shenandoah_River', 'Styx', 'River_Styx', 'Sun_River', 'Suriname_River', 'Surinam_River', 'Susquehanna', 'Susquehanna_River', 'Tagus', 'Tagus_River', 'Tallapoosa', 'Tallapoosa_River', 'Tennessee', 'Tennessee_River', 'Thames', 'River_Thames', 'Thames_River', 'Tiber', 'Tevere', 'Tigris', 'Tigris_River', 'Tocantins', 'Tocantins_River', 'Tombigbee', 'Tombigbee_River', 'Trent', 'River_Trent', 'Trent_River', 'Trinity_River', 'Tunguska', 'Lower_Tunguska', 'Tunguska', 'Stony_Tunguska', 'Tyne', 'River_Tyne', 'Tyne_River', 'Urubupunga', 'Urubupunga_Falls', 'Uruguay_River', 'valley', 'vale', 'Vetluga', 'Vetluga_River', 'Vistula', 'Vistula_River', 'Volga', 'Volga_River', 'Volkhov', 'Volkhov_River', 'Volta', 'waterfall', 'falls', 'water_system', 'Weser', 'Weser_River', 'Willamette', 'Willamette_River', 'Yalu', 'Yalu_River', 'Chang_Jiang', 'Changjiang', 'Chang', 'Yangtze', 'Yangtze_River', 'Yangtze_Kiang', 'Yazoo', 'Yazoo_River', 'Yenisei', 'Yenisei_River', 'Yenisey', 'Yenisey_River', 'Yukon', 'Yukon_River', 'Zambezi', 'Zambezi_River', 'Zhu_Jiang', 'Canton_River', 'Chu_Kiang', 'Pearl_River', 'Charon', 'naiad', 'Achilles', 'finisher', 'Algonkian', 'Algonkin', 'Arikara', 'Aricara', 'Chinook', 'Conoy', 'Halchidhoma', 'Hidatsa', 'Gros_Ventre', 'Kansa', 'Kansas', 'Karok', 'Maidu', 'Maricopa', 'Missouri', 'Mohave', 'Mojave', 'Ofo', 'Omaha', 'Maha', 'Osage', 'Oto', 'Otoe', 'Pamlico', 'Ponca', 'Ponka', 'Quapaw', 'Shahaptian', 'Sahaptin', 'Sahaptino', 'Shawnee', 'Tsimshian', 'Walapai', 'Hualapai', 'Hualpai', 'Yeniseian', 'Yakut', 'charioteer', 'driver', 'honker', 'lasher', 'mahout', 'nondriver', 'road_hog', 'roadhog', 'speeder', 'speed_demon', 'tailgater', 'teamster', 'test_driver', 'wagoner', 'waggoner', 'Cartier', 'Jacques_Cartier', 'Oldfield', 'Barney_Oldfield', 'Berna_Eli_Oldfield', 'debacle', 'bald_cypress', 'swamp_cypress', 'pond_bald_cypress', 'southern_cypress', 'Taxodium_distichum', 'Montezuma_cypress', 'Mexican_swamp_cypress', 'Taxodium_mucronatum', 'pistia', 'water_lettuce', 'water_cabbage', 'Pistia_stratiotes', 'Pistia_stratoites', 'great_yellowcress', 'Rorippa_amphibia', 'Nasturtium_amphibium', 'giant_reed', 'Arundo_donax', 'Phragmites', 'genus_Phragmites', 'black_birch', 'river_birch', 'red_birch', 'Betula_nigra', 'river_red_gum', 'river_gum', 'Eucalyptus_camaldulensis', 'Eucalyptus_rostrata', 'false_indigo', 'bastard_indigo', 'Amorpha_fruticosa', 'thermal_pollution', 'water_pollution', 'alluvial_soil', 'Senegal_gum', 'silt']
But it seems like there are still some noise from the "brute force" method. Let's try to assume that if it is the name of the river it should start with an uppercase, so let's try:
list(chain(*[ [j for j in i.lemma_names if j[0].isupper()] for i in wn.all_synsets() if "river" in i.definition]))
[out]:
['Acheronian', 'Acherontic', 'Stygian', 'Lao', 'Aegospotami', 'Aegospotamos', 'Yalu_River', 'Gasterosteus_aculeatus', 'Gasterosteus_pungitius', 'Cryptobranchus_alleganiensis', 'Scaphiopus_bombifrons', 'Pseudemys_concinna', 'Trionyx_spiniferus', 'Trionyx_muticus', 'Anas_acuta', 'Ancylus', 'Palaemon_australis', 'Platanistidae', 'Hippopotamus_amphibius', 'Australian_lungfish', 'Queensland_lungfish', 'Neoceratodus_forsteri', 'Alosa_pseudoharengus', 'Pomolobus_pseudoharengus', 'Oncorhynchus_nerka', 'Salmo_trutta', 'Australian_arowana', 'Dawson_River_salmon', 'Scleropages_leichardti', 'Australian_bonytongue', 'Scleropages_jardinii', 'Roccus_saxatilis', 'Tilapia_nilotica', 'Chinese_paddlefish', 'Psephurus_gladis', 'Augean_stables', 'Lake_Volta', 'L-plate', 'Phillips_screwdriver', 'Copehan', 'Volgaic', 'Teamsters_Union', 'Mammoth_Cave_National_Park', 'Zion_National_Park', 'Yangon', 'Rangoon', "N'Djamena", 'Ndjamena', 'Fort-Lamy', 'Kinshasa', 'Leopoldville', 'Saxony', 'Sachsen', 'Saxe', 'Cologne', 'Koln', 'Mannheim', 'Rhineland', 'Rheinland', 'Ruhr', 'Ruhr_Valley', 'West_Bank', 'Pennines', 'Pennine_Chain', 'Ottawa', 'Canadian_capital', 'Antwerpen', 'Antwerp', 'Anvers', 'Orleans', 'Rhone-Alpes', 'Friesland', 'Timbuktu', 'Bydgoszcz', 'Bromberg', 'Novosibirsk', 'Tbilisi', 'Tiflis', 'Toledo', 'Selma', 'Denver', 'Mile-High_City', 'Hartford', 'Savannah', 'Topeka', 'Louisville', 'New_Orleans', 'Detroit', 'Motor_City', 'Motown', 'Minneapolis', 'Saint_Paul', 'St._Paul', 'Jefferson_City', 'Saint_Louis', 'St._Louis', 'Gateway_to_the_West', 'Billings', 'Great_Falls', 'Omaha', 'Concord', 'Manchester', 'Trenton', 'Albuquerque', 'New_Netherland', 'Albany', 'Erie_Canal', 'New_York', 'New_York_City', 'Greater_New_York', 'West_Point', 'Niagara_Falls', 'Schenectady', 'Bismarck', 'Fargo', 'Cincinnati', 'Tulsa', 'Chester', 'Philadelphia', 'City_of_Brotherly_Love', 'Pierre', 'Mount_Vernon', 'Charleston', 'Huntington', 'Morgantown', 'Parkersburg', 'Wheeling', 'Casper', 'Ciudad_Bolivar', 'Aare', 'Aar', 'Aare_River', 'Acheron', 'River_Acheron', 'Adige', 'River_Adige', 'Aire', 'River_Aire', 'Aire_River', 'Alabama', 'Alabama_River', 'Allegheny', 'Allegheny_River', 'Amazon', 'Amazon_River', 'Amur', 'Amur_River', 'Heilong_Jiang', 'Heilong', 'Angara', 'Angara_River', 'Tunguska', 'Upper_Tunguska', 'Apalachicola', 'Apalachicola_River', 'Araguaia', 'Araguaia_River', 'Araguaya', 'Araguaya_River', 'Aras', 'Araxes', 'Arauca', 'Argun', 'Argun_River', 'Ergun_He', 'Arkansas', 'Arkansas_River', 'Arno', 'Arno_River', 'River_Arno', 'Avon', 'River_Avon', 'Upper_Avon', 'Upper_Avon_River', 'Avon', 'River_Avon', 'Bighorn', 'Bighorn_River', 'Big_Sioux_River', 'Brahmaputra', 'Brahmaputra_River', 'Brazos', 'Brazos_River', 'Caloosahatchee', 'Caloosahatchee_River', 'Cam', 'River_Cam', 'Cam_River', 'Canadian', 'Canadian_River', 'Cape_Fear_River', 'Chao_Phraya', 'Charles', 'Charles_River', 'Chattahoochee', 'Chattahoochee_River', 'Cimarron', 'Cimarron_River', 'Clinch_River', 'Clyde', 'Cocytus', 'River_Cocytus', 'Colorado', 'Colorado_River', 'Colorado', 'Colorado_River', 'Columbia', 'Columbia_River', 'Congo', 'Congo_River', 'Zaire_River', 'Connecticut', 'Connecticut_River', 'Coosa', 'Coosa_River', 'Cumberland', 'Cumberland_River', 'Danube', 'Danube_River', 'Danau', 'Darling', 'Darling_River', 'Delaware', 'Delaware_River', 'Demerara', 'Detroit_River', 'Dnieper', 'Dnieper_River', 'Don', 'Don_River', 'Ebro', 'Ebro_River', 'Elbe', 'Elbe_River', 'Elizabeth_River', 'Euphrates', 'Euphrates_River', 'Flint', 'Flint_River', 'Forth', 'Forth_River', 'Fox_River', 'Ganges', 'Ganges_River', 'Gan_Jiang', 'Kan_River', 'Garonne', 'Garonne_River', 'Gila', 'Gila_River', 'Grand_River', 'Green', 'Green_River', 'Housatonic', 'Housatonic_River', 'Huang_He', 'Hwang_Ho', 'Yellow_River', 'Hudson', 'Hudson_River', 'IJssel', 'IJssel_river', 'Illinois_River', 'Indigirka', 'Indigirka_River', 'Indus', 'Indus_River', 'Irrawaddy', 'Irrawaddy_River', 'Irtish', 'Irtish_River', 'Irtysh', 'Irtysh_River', 'Isere', 'Isere_River', 'James', 'James_River', 'James', 'James_River', 'Jordan', 'Jordan_River', 'Kansas', 'Kansas_River', 'Kaw_River', 'Kasai', 'Kasai_River', 'River_Kasai', 'Kissimmee', 'Kissimmee_River', 'Klamath', 'Klamath_River', 'Kura', 'Kura_River', 'Lake_Chad', 'Chad', 'Lehigh_River', 'Lena', 'Lena_River', 'Lethe', 'River_Lethe', 'Limpopo', 'Crocodile_River', 'Little_Bighorn', 'Little_Bighorn_River', 'Little_Horn', 'Little_Missouri', 'Little_Missouri_River', 'Little_Sioux_River', 'Little_Wabash', 'Little_Wabash_River', 'Loire', 'Loire_River', 'Mackenzie', 'Mackenzie_River', 'Madeira', 'Madeira_River', 'Magdalena', 'Magdalena_River', 'Mekong', 'Mekong_River', 'Merrimack', 'Merrimack_River', 'Meuse', 'Meuse_River', 'Milk', 'Milk_River', 'Mississippi', 'Mississippi_River', 'Missouri', 'Missouri_River', 'Mobile', 'Mobile_River', 'Mohawk_River', 'Monongahela', 'Monongahela_River', 'Moreau_River', 'Murray', 'Murray_River', 'Murrumbidgee', 'Murrumbidgee_River', 'Namoi', 'Namoi_River', 'Nan', 'Nan_River', 'Neckar', 'Neckar_River', 'Neosho', 'Neosho_River', 'Neva', 'Neva_River', 'New_River', 'Niagara', 'Niagara_River', 'Niger', 'Niger_River', 'Nile', 'Nile_River', 'North_Platte', 'North_Platte_River', 'Ob', 'Ob_River', 'Oder', 'Oder_River', 'Ohio', 'Ohio_River', 'Orange', 'Orange_River', 'Orinoco', 'Orinoco_River', 'Osage', 'Osage_River', 'Outaouais', 'Ottawa', 'Ottawa_river', 'Ouachita', 'Ouachita_River', 'Ouse', 'Ouse_River', 'Parana', 'Parana_River', 'Parnaiba', 'Parnahiba', 'Pearl_River', 'Pee_Dee', 'Pee_Dee_River', 'Penobscot', 'Penobscot_River', 'Ping', 'Ping_River', 'Platte', 'Platte_River', 'Po', 'Po_River', 'Potomac', 'Potomac_River', 'Purus', 'Purus_River', 'Rappahannock', 'Rappahannock_River', 'Rhine', 'Rhine_River', 'Rhein', 'Rhone', 'Rhone_River', 'Rio_Grande', 'Rio_Bravo', 'Russian_River', 'Saale', 'Saale_River', 'Sabine', 'Sabine_River', 'Sacramento_River', 'Saint_John', 'Saint_John_River', 'St._John', 'St._John_River', 'Saint_Johns', 'Saint_Johns_River', 'St._Johns', 'St._Johns_River', 'Saint_Lawrence', 'Saint_Lawrence_River', 'St._Lawrence', 'St._Lawrence_River', 'Sambre', 'Sambre_River', 'San_Joaquin_River', 'Sao_Francisco', 'Saone', 'Saone_River', 'Savannah', 'Savannah_River', 'Scheldt', 'Scheldt_River', 'Seine', 'Seine_River', 'Severn', 'River_Severn', 'Severn_River', 'Severn', 'Severn_River', 'Seyhan', 'Seyhan_River', 'Shari', 'Shari_River', 'Chari', 'Chari_River', 'Shenandoah_River', 'Styx', 'River_Styx', 'Sun_River', 'Suriname_River', 'Surinam_River', 'Susquehanna', 'Susquehanna_River', 'Tagus', 'Tagus_River', 'Tallapoosa', 'Tallapoosa_River', 'Tennessee', 'Tennessee_River', 'Thames', 'River_Thames', 'Thames_River', 'Tiber', 'Tevere', 'Tigris', 'Tigris_River', 'Tocantins', 'Tocantins_River', 'Tombigbee', 'Tombigbee_River', 'Trent', 'River_Trent', 'Trent_River', 'Trinity_River', 'Tunguska', 'Lower_Tunguska', 'Tunguska', 'Stony_Tunguska', 'Tyne', 'River_Tyne', 'Tyne_River', 'Urubupunga', 'Urubupunga_Falls', 'Uruguay_River', 'Vetluga', 'Vetluga_River', 'Vistula', 'Vistula_River', 'Volga', 'Volga_River', 'Volkhov', 'Volkhov_River', 'Volta', 'Weser', 'Weser_River', 'Willamette', 'Willamette_River', 'Yalu', 'Yalu_River', 'Chang_Jiang', 'Changjiang', 'Chang', 'Yangtze', 'Yangtze_River', 'Yangtze_Kiang', 'Yazoo', 'Yazoo_River', 'Yenisei', 'Yenisei_River', 'Yenisey', 'Yenisey_River', 'Yukon', 'Yukon_River', 'Zambezi', 'Zambezi_River', 'Zhu_Jiang', 'Canton_River', 'Chu_Kiang', 'Pearl_River', 'Charon', 'Achilles', 'Algonkian', 'Algonkin', 'Arikara', 'Aricara', 'Chinook', 'Conoy', 'Halchidhoma', 'Hidatsa', 'Gros_Ventre', 'Kansa', 'Kansas', 'Karok', 'Maidu', 'Maricopa', 'Missouri', 'Mohave', 'Mojave', 'Ofo', 'Omaha', 'Maha', 'Osage', 'Oto', 'Otoe', 'Pamlico', 'Ponca', 'Ponka', 'Quapaw', 'Shahaptian', 'Sahaptin', 'Sahaptino', 'Shawnee', 'Tsimshian', 'Walapai', 'Hualapai', 'Hualpai', 'Yeniseian', 'Yakut', 'Cartier', 'Jacques_Cartier', 'Oldfield', 'Barney_Oldfield', 'Berna_Eli_Oldfield', 'Taxodium_distichum', 'Montezuma_cypress', 'Mexican_swamp_cypress', 'Taxodium_mucronatum', 'Pistia_stratiotes', 'Pistia_stratoites', 'Rorippa_amphibia', 'Nasturtium_amphibium', 'Arundo_donax', 'Phragmites', 'Betula_nigra', 'Eucalyptus_camaldulensis', 'Eucalyptus_rostrata', 'Amorpha_fruticosa', 'Senegal_gum']
Let's go crazy and say that only if the word "River" appears in the lemma, it is a river:
>>> list(chain(*[ [j for j in i.lemma_names if j[0].isupper() and "River" in j] for i in wn.all_synsets() if "river" in i.definition]))
[out]:
['Yalu_River', 'Dawson_River_salmon', 'Aare_River', 'River_Acheron', 'River_Adige', 'River_Aire', 'Aire_River', 'Alabama_River', 'Allegheny_River', 'Amazon_River', 'Amur_River', 'Angara_River', 'Apalachicola_River', 'Araguaia_River', 'Araguaya_River', 'Argun_River', 'Arkansas_River', 'Arno_River', 'River_Arno', 'River_Avon', 'Upper_Avon_River', 'River_Avon', 'Bighorn_River', 'Big_Sioux_River', 'Brahmaputra_River', 'Brazos_River', 'Caloosahatchee_River', 'River_Cam', 'Cam_River', 'Canadian_River', 'Cape_Fear_River', 'Charles_River', 'Chattahoochee_River', 'Cimarron_River', 'Clinch_River', 'River_Cocytus', 'Colorado_River', 'Colorado_River', 'Columbia_River', 'Congo_River', 'Zaire_River', 'Connecticut_River', 'Coosa_River', 'Cumberland_River', 'Danube_River', 'Darling_River', 'Delaware_River', 'Detroit_River', 'Dnieper_River', 'Don_River', 'Ebro_River', 'Elbe_River', 'Elizabeth_River', 'Euphrates_River', 'Flint_River', 'Forth_River', 'Fox_River', 'Ganges_River', 'Kan_River', 'Garonne_River', 'Gila_River', 'Grand_River', 'Green_River', 'Housatonic_River', 'Yellow_River', 'Hudson_River', 'Illinois_River', 'Indigirka_River', 'Indus_River', 'Irrawaddy_River', 'Irtish_River', 'Irtysh_River', 'Isere_River', 'James_River', 'James_River', 'Jordan_River', 'Kansas_River', 'Kaw_River', 'Kasai_River', 'River_Kasai', 'Kissimmee_River', 'Klamath_River', 'Kura_River', 'Lehigh_River', 'Lena_River', 'River_Lethe', 'Crocodile_River', 'Little_Bighorn_River', 'Little_Missouri_River', 'Little_Sioux_River', 'Little_Wabash_River', 'Loire_River', 'Mackenzie_River', 'Madeira_River', 'Magdalena_River', 'Mekong_River', 'Merrimack_River', 'Meuse_River', 'Milk_River', 'Mississippi_River', 'Missouri_River', 'Mobile_River', 'Mohawk_River', 'Monongahela_River', 'Moreau_River', 'Murray_River', 'Murrumbidgee_River', 'Namoi_River', 'Nan_River', 'Neckar_River', 'Neosho_River', 'Neva_River', 'New_River', 'Niagara_River', 'Niger_River', 'Nile_River', 'North_Platte_River', 'Ob_River', 'Oder_River', 'Ohio_River', 'Orange_River', 'Orinoco_River', 'Osage_River', 'Ouachita_River', 'Ouse_River', 'Parana_River', 'Pearl_River', 'Pee_Dee_River', 'Penobscot_River', 'Ping_River', 'Platte_River', 'Po_River', 'Potomac_River', 'Purus_River', 'Rappahannock_River', 'Rhine_River', 'Rhone_River', 'Russian_River', 'Saale_River', 'Sabine_River', 'Sacramento_River', 'Saint_John_River', 'St._John_River', 'Saint_Johns_River', 'St._Johns_River', 'Saint_Lawrence_River', 'St._Lawrence_River', 'Sambre_River', 'San_Joaquin_River', 'Saone_River', 'Savannah_River', 'Scheldt_River', 'Seine_River', 'River_Severn', 'Severn_River', 'Severn_River', 'Seyhan_River', 'Shari_River', 'Chari_River', 'Shenandoah_River', 'River_Styx', 'Sun_River', 'Suriname_River', 'Surinam_River', 'Susquehanna_River', 'Tagus_River', 'Tallapoosa_River', 'Tennessee_River', 'River_Thames', 'Thames_River', 'Tigris_River', 'Tocantins_River', 'Tombigbee_River', 'River_Trent', 'Trent_River', 'Trinity_River', 'River_Tyne', 'Tyne_River', 'Uruguay_River', 'Vetluga_River', 'Vistula_River', 'Volga_River', 'Volkhov_River', 'Weser_River', 'Willamette_River', 'Yalu_River', 'Yangtze_River', 'Yazoo_River', 'Yenisei_River', 'Yenisey_River', 'Yukon_River', 'Zambezi_River', 'Canton_River', 'Pearl_River']
Much better but i think you're better off just crawling the names from http://en.wikipedia.org/wiki/Lists_of_rivers . Have fun!
To show that the solution for "river" using NLTK wordnet won't scale other entities, and also answer #tripleee's question.
If you're looking for animals, you can simply recursively get all hyponyms of animals, as such:
list(set([w for s in vehicle.closure(lambda s:s.hyponyms()) for w in s.lemma_names]))
We can use sentence tagging to get proper nouns. Filter them for required output.
Perhaps:
from nltk.tag import pos_tag
sentence = "Amazon is great river. Mississippi is awesome too."
tagged_sent = pos_tag(sentence.split())
tagged_sent will yield similar tagged out where NNP is the proper noun.
[('Amazon', 'NNP'), ('is', 'VBZ'), ('great', 'JJ'), ('river.', 'NNP'), ('Mississippi', 'NNP'), ('is', 'VBZ'), ('awesome', 'VBN'), ('too.', '-NONE-')]
propernouns = [word for word,pos in tagged_sent if pos == 'NNP']
propernouns would return
['Amazon', 'river.', 'Mississippi']
You can set categories for each and use a function to return them.
If you want to get lists of things of a specific type e.g. rivers http://dbPedia.org, http://freebase.com or http://wikidata.org are the better choice.
This dbPedia SPARQL query returns all rivers known to Wikipedia:
SELECT ?name ?description WHERE {
{?river rdf:type dbpedia-owl:River} .
?river foaf:name ?name .
?river rdfs:comment ?description .
}
ORDER BY ?name
http://bit.ly/1jc8Ip6