Complex dictionary sorting

Complex dictionary sorting - python

I have a dictionary with keys that are words and each word has a value that is a number. I want to output the top 10 largest values of keys, but I have multiple keys of the same value. How do I display the alphabetically sorted keys along with the other keys that are either by itself (unique value) or also sorted (shares same value as other keys)?
HERE IS MY DICTIONARY AS PROMISED!
{'callooh': 1, 'all': 2, 'beware': 1, 'through': 3, 'eyes': 1, 'its': 1, 'callay': 1,
'jubjub': 1, 'to': 1, 'frumious': 1, 'wood': 1, 'tulgey': 1, 'has': 1, 'his': 2,
'"beware': 1, 'one': 2, 'day': 1, 'mome': 2, 'uffish': 1, 'manxome': 1, 'did': 2,
'galumphing': 1, 'whiffling': 1, '`twas': 1, 'went': 2, 'outgrabe': 2, 'slithy': 2,
'blade': 1, 'bandersnatch!"': 1, 'jaws': 1, 'snicker-snack': 1, 'back': 1, 'dead': 1,
'stood': 2, 'foe': 1, 'bird': 1, 'claws': 1, 'joy': 1, 'shun': 1, 'come': 1, 'by': 1,
'boy': 1, 'raths': 2, 'thou': 1, 'of': 1, 'o': 1, 'toves': 2, 'son': 1, '"and': 1,
'slain': 1, 'twas': 1, 'brillig': 2, 'bite': 1, 'two': 2, 'long': 1, 'head': 1, 'that': 2,
'took': 1, 'vorpal': 2, 'arms': 1, 'catch': 1, 'with': 2, 'he': 7, 'wabe': 2,
'tree': 1, 'flame': 1, 'were': 2, 'chortled': 1, 'beamish': 1, **'and': 13**,
'gimble': 2, 'it': 2, 'as': 2, 'in': 6, 'sought': 1, 'my': 3, 'awhile': 1, 'mimsy': 2,
'sword': 1, 'borogoves': 2, 'hand': 1, 'rested': 1, 'frabjous': 1, 'gyre': 2,
'tumtum': 1, 'thought': 2, 'so': 1, 'time': 1, 'jabberwock': 3, **'the': 19**,
'burbled': 1, 'came': 2, 'left': 1}

>>> from itertools import islice, chain, repeat
>>> food = {1: ['apple', 'chai', 'coffe', 'dom banana'], 2: ['pie', 'tea'], 3: ['bacon', 'pepsi'], 4: ['strawberry'], 5: ['egg'], 7: ['cake', 'ham'], 9: ['milk', 'mocha'], 10: ['pear'], 11: ['chicken', 'latte'], 13: ['coke'], 20: ['chocolate']}
>>> list(islice(chain.from_iterable(repeat(k, len(v))
for k, v in
sorted(food.iteritems(), reverse=True)), 10))
[20, 13, 11, 11, 10, 9, 9, 7, 7, 5]

I'm not sure I completely understand, but you can try something like:
# Assuming the data you're working with is something like:
>>> d = {'apple': 10, 'banana': 10, 'pear': 5, 'peach': 35, 'plum': 17, 'tomato': 17}
# Use - to order by values descending, key ordering will still be ascending.
>>> sorted(d.items(), key = lambda kv: (-kv[1], kv[0]))
[('peach', 35),
('plum', 17),
('tomato', 17),
('apple', 10),
('banana', 10),
('pear', 5)]

Related

Sort the keys of a dictionary by key using a list and for loop [duplicate]

This question already has answers here:
convert a dict to sorted dict in python
(2 answers)
Closed 2 years ago.
I need to sort this dictionary that counts the times that some words appear in a song:
word_freq = {'love': 25, 'conversation': 1, 'every': 6, "we're": 1, 'plate': 1, 'sour': 1, 'jukebox': 1, 'now': 11, 'taxi': 1, 'fast': 1, 'bag': 1, 'man': 1, 'push': 3, 'baby': 14, 'going': 1, 'you': 16, "don't": 2, 'one': 1, 'mind': 2, 'backseat': 1, 'friends': 1, 'then': 3, 'know': 2, 'take': 1, 'play': 1, 'okay': 1, 'so': 2, 'begin': 1, 'start': 2, 'over': 1, 'body': 17, 'boy': 2, 'just': 1, 'we': 7, 'are': 1, 'girl': 2, 'tell': 1, 'singing': 2, 'drinking': 1, 'put': 3, 'our': 1, 'where': 1, "i'll": 1, 'all': 1, "isn't": 1, 'make': 1, 'lover': 1, 'get': 1, 'radio': 1, 'give': 1, "i'm": 23, 'like': 10, 'can': 1, 'doing': 2, 'with': 22, 'club': 1, 'come': 37, 'it': 1, 'somebody': 2, 'handmade': 2, 'out': 1, 'new': 6, 'room': 3, 'chance': 1, 'follow': 6, 'in': 27, 'may': 2, 'brand': 6, 'that': 2, 'magnet': 3, 'up': 3, 'first': 1, 'and': 23, 'pull': 3, 'of': 6, 'table': 1, 'much': 2, 'last': 3, 'i': 6, 'thrifty': 1, 'grab': 2, 'was': 2, 'driver': 1, 'slow': 1, 'dance': 1, 'the': 18, 'say': 2, 'trust': 1, 'family': 1, 'week': 1, 'date': 1, 'me': 10, 'do': 3, 'waist': 2, 'smell': 3, 'day': 6, 'although': 3, 'your': 21, 'leave': 1, 'want': 2, "let's": 2, 'lead': 6, 'at': 1, 'hand': 1, 'how': 1, 'talk': 4, 'not': 2, 'eat': 1, 'falling': 3, 'about': 1, 'story': 1, 'sweet': 1, 'best': 1, 'crazy': 2, 'let': 1, 'too': 5, 'van': 1, 'shots': 1, 'go': 2, 'to': 2, 'a': 8, 'my': 33, 'is': 5, 'place': 1, 'find': 1, 'shape': 6, 'on': 40, 'kiss': 1, 'were': 3, 'night': 3, 'heart': 3, 'for': 3, 'discovering': 6, 'something': 6, 'be': 16, 'bedsheets': 3, 'fill': 2, 'hours': 2, 'stop': 1, 'bar': 1}
In order to do it I need:
To create a new list just with the keys of the dictionary.
keys = list(word_freq.keys())
Sort the key list.
keys.sort()
Create an empty dictionary.
word_freq2 = {}
Use a for loop lo iterate each value of the list. For each iterated, find the corresponding value in the first dictionary and insert the key-value pair to the new empty dictionary.
This is my best solution up to now:
for key in keys:
if key in word_freq:
word_freq2.update({key: value})
print(word_freq2)
The problem is that I don't know how to add the correct value because right know I receive just 1 as a value, as I show here:
{'a': 1, 'about': 1, 'all': 1, 'although': 1, 'and': 1, 'are': 1, 'at': 1, 'baby': 1, 'backseat': 1, 'bag': 1, 'bar': 1, 'be': 1, 'bedsheets': 1, 'begin': 1, 'best': 1, 'body': 1, 'boy': 1, 'brand': 1, 'can': 1, 'chance': 1, 'club': 1, 'come': 1, 'conversation': 1, 'crazy': 1, 'dance': 1, 'date': 1, 'day': 1, 'discovering': 1, 'do': 1, 'doing': 1, "don't": 1, 'drinking': 1, 'driver': 1, 'eat': 1, 'every': 1, 'falling': 1, 'family': 1, 'fast': 1, 'fill': 1, 'find': 1, 'first': 1, 'follow': 1, 'for': 1, 'friends': 1, 'get': 1, 'girl': 1, 'give': 1, 'go': 1, 'going': 1, 'grab': 1, 'hand': 1, 'handmade': 1, 'heart': 1, 'hours': 1, 'how': 1, 'i': 1, "i'll": 1, "i'm": 1, 'in': 1, 'is': 1, "isn't": 1, 'it': 1, 'jukebox': 1, 'just': 1, 'kiss': 1, 'know': 1, 'last': 1, 'lead': 1, 'leave': 1, 'let': 1, "let's": 1, 'like': 1, 'love': 1, 'lover': 1, 'magnet': 1, 'make': 1, 'man': 1, 'may': 1, 'me': 1, 'mind': 1, 'much': 1, 'my': 1, 'new': 1, 'night': 1, 'not': 1, 'now': 1, 'of': 1, 'okay': 1, 'on': 1, 'one': 1, 'our': 1, 'out': 1, 'over': 1, 'place': 1, 'plate': 1, 'play': 1, 'pull': 1, 'push': 1, 'put': 1, 'radio': 1, 'room': 1, 'say': 1, 'shape': 1, 'shots': 1, 'singing': 1, 'slow': 1, 'smell': 1, 'so': 1, 'somebody': 1, 'something': 1, 'sour': 1, 'start': 1, 'stop': 1, 'story': 1, 'sweet': 1, 'table': 1, 'take': 1, 'talk': 1, 'taxi': 1, 'tell': 1, 'that': 1, 'the': 1, 'then': 1, 'thrifty': 1, 'to': 1, 'too': 1, 'trust': 1, 'up': 1, 'van': 1, 'waist': 1, 'want': 1, 'was': 1, 'we': 1, "we're": 1, 'week': 1, 'were': 1, 'where': 1, 'with': 1, 'you': 1, 'your': 1}

This code seems to work just fine:
word_freq = {'love': 25, 'conversation': 1, 'every': 6, "we're": 1, 'plate': 1, 'sour': 1, 'jukebox': 1, 'now': 11, 'taxi': 1, 'fast': 1, 'bag': 1, 'man': 1, 'push': 3, 'baby': 14, 'going': 1, 'you': 16, "don't": 2, 'one': 1, 'mind': 2, 'backseat': 1, 'friends': 1, 'then': 3, 'know': 2, 'take': 1, 'play': 1, 'okay': 1, 'so': 2, 'begin': 1, 'start': 2, 'over': 1, 'body': 17, 'boy': 2, 'just': 1, 'we': 7, 'are': 1, 'girl': 2, 'tell': 1, 'singing': 2, 'drinking': 1, 'put': 3, 'our': 1, 'where': 1, "i'll": 1, 'all': 1, "isn't": 1, 'make': 1, 'lover': 1, 'get': 1, 'radio': 1, 'give': 1, "i'm": 23, 'like': 10, 'can': 1, 'doing': 2, 'with': 22, 'club': 1, 'come': 37, 'it': 1, 'somebody': 2, 'handmade': 2, 'out': 1, 'new': 6, 'room': 3, 'chance': 1, 'follow': 6, 'in': 27, 'may': 2, 'brand': 6, 'that': 2, 'magnet': 3, 'up': 3, 'first': 1, 'and': 23, 'pull': 3, 'of': 6, 'table': 1, 'much': 2, 'last': 3, 'i': 6, 'thrifty': 1, 'grab': 2, 'was': 2, 'driver': 1, 'slow': 1, 'dance': 1, 'the': 18, 'say': 2, 'trust': 1, 'family': 1, 'week': 1, 'date': 1, 'me': 10, 'do': 3, 'waist': 2, 'smell': 3, 'day': 6, 'although': 3, 'your': 21, 'leave': 1, 'want': 2, "let's": 2, 'lead': 6, 'at': 1, 'hand': 1, 'how': 1, 'talk': 4, 'not': 2, 'eat': 1, 'falling': 3, 'about': 1, 'story': 1, 'sweet': 1, 'best': 1, 'crazy': 2, 'let': 1, 'too': 5, 'van': 1, 'shots': 1, 'go': 2, 'to': 2, 'a': 8, 'my': 33, 'is': 5, 'place': 1, 'find': 1, 'shape': 6, 'on': 40, 'kiss': 1, 'were': 3, 'night': 3, 'heart': 3, 'for': 3, 'discovering': 6, 'something': 6, 'be': 16, 'bedsheets': 3, 'fill': 2, 'hours': 2, 'stop': 1, 'bar': 1}
keys = list(word_freq.keys())
keys.sort()
word_freq2 = {}
for key in keys:
word_freq2[key] = word_freq[key]
print(word_freq2)

How to compute the words given in a dict and compute if the word is on a premium tile?

I am currently working on a project that computes the score in a scrabble game. I am stuck on the part that could calculate the points of a word and compute if the word is on the premium tile. This is the code I have done so far
words_dict = {'CART': [(6, 2), (6, 3), (6, 4), (6, 5)], 'THIEF': [(6, 5), (7, 5), (8, 5), (9, 5), (10, 5)], 'HORN': [(7, 5), (7, 6), (7, 7), (7, 8)]}
Next to the words are the coordinates for every word that was found in a numpy array.
premium_tiles = [[ 3, 1, 1, 2, 1, 1, 1, 3, 1, 1, 1, 2, 1, 1, 3 ],
[ 1, 2, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 2, 1 ],
[ 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1 ],
[ 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2 ],
[ 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1 ],
[ 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1 ],
[ 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1 ],
[ 3, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 3 ],
[ 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1 ],
[ 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1 ],
[ 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1 ],
[ 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2 ],
[ 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1 ],
[ 1, 2, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 2, 1 ],
[ 3, 1, 1, 2, 1, 1, 1, 3, 1, 1, 1, 2, 1, 1, 3 ]]
On this array are points that I need to calculate the points in a word. The expected that I want is for example in CART since I made an array which look like this.
points = {
'A': 1, 'B': 3, 'C': 3,
'D': 2, 'E': 1, 'F': 4, 'G': 2,
'H': 4, 'I': 1, 'J': 8, 'K': 5,
'L': 1, 'M': 3, 'N': 1, 'O': 1,
'P': 3, 'Q': 10, 'R': 1, 'S': 1,
'T': 1, 'U': 1, 'V': 4, 'W': 4, 'X': 8,
'Y': 4, 'Z': 10, '#': 0
}
I've tried to do code but I can't seem to make it work. This is the code that I have done so far.
for key,value in words_dict.items():
for keys in key:
if len(keys) == 1:
for i,j in value:
if keys == key[m+1]:
sum1 += premium_tiles[i][j]*points[keys]
print(sum1)
sumsOfwords.append(sum1)
print(sumsOfwords)
In C=3, A=1, R=1, and T=1, then it will go to the premium tiles to multiply each letter in a specified coordinate. Example the coordinates in the cart is [(6, 2), (6, 3), (6, 4), (6, 5)] it will go to the array premium tiles and multiply the letters which C=3x2, A=1x1, R=1x1, and T=1x1. CART =(3x2+1x1+1x1+1x1) which is equal to 9. Can someone help pls? I am in need desperately.

I can't test this but it appears to me that you have extra blocks that don't make sense.
sumsOfwords = []
for key, value in words_dict.items():
sum1 = 0
for idx, (i, j) in enumerate(value):
sum1 += premium_tiles[i][j] * points[key[idx]]
# the multiplier loop goes here
# don't combine it with the prior loop
print(sum1)
sumsOfwords.append(sum1)
print(sumsOfwords)
On the double and triple squares: With another table like premium_tiles that has 1 on every normal square and 2 and 3 on the double and triples.
The code would look like this:
for i, j in value:
if not used[i][j]:
sum1 *= mult[i][j]
used[i][j] = True
where used and mult are tables like premium_tiles except that used starts out full of False and mult has 1 anywhere there isn't a 2 or 3.
How can I count the points if the words are the same?
The issue there is that words_dict need to change from a dictionary to a list of tuples [(word, [position])] as a dictionary only permits one entry per key.

Python Pyplot word occurrence frequency

I have to plot the occurrence of each frequency of word in a txt file. So far I have the dictionary that contains each word and the frequency that it appears in the txt file. In order to plot, I have to convert that dictionary into a new dictionary (I'm assuming) that counts the number words at each frequency. For instance, if 5 words appear 3 times in the txt file, those need to be a single dictionary grouping that will plot the frequency as the x axis and number of words at that frequency on the y axis.
What I have now is simply not working:
def plot(word_dict):
new_dict = {}
for value in word_dict.values():
if value in word_dict:
new_dict += 1
else:
new_dict = 1
y = new_dict[value]
x = word_dict[value]
pyplot.plot(x, y)
pyplot.show()
A sample of data:
{'bangs': 1, 'sees': 1, 'stuff,': 1, 'Knox....': 1, 'Well': 1, 'about': 2, 'your': 1, 'blocks.': 1, 'what': 4, 'beetles....': 1, 'Boom': 1, 'blue': 1, 'paddled': 1, 'mixed': 1, 'fox': 5, 'Through': 1, 'on': 16, 'trick,': 2, 'When': 4, '...a': 1, 'silly': 1, 'band.': 2, 'come.': 3, "We'll": 2, 'likes': 2, 'slick,': 1, 'comes?': 1, 'chick': 1, 'goo,': 1, "it's": 2, 'then,': 1, 'muddled': 1, 'Now': 3, 'not': 1, 'flew,': 1, 'If,': 1, 'sneeze.': 1, 'bottled': 1, 'paddle': 4, 'called': 1, 'Goo-Goose,': 1, 'Blue': 2, 'Come': 1, 'fox.': 1, 'can': 3, 'poodle,': 1, 'this': 7, "Sue's": 4, 'Ben': 5, 'is': 7, 'goes': 1, 'to': 10, 'Crow': 4, 'cheese': 2, 'quick': 5, 'sir.': 27, 'easy,': 1, 'Clocks': 2, 'Fox': 6, 'Stop': 2, 'up': 1, 'be': 1, 'Well...': 1, 'hose': 2, 'Rose': 1, 'three': 4, 'Freezy': 2, 'New': 3, 'hate': 1, 'broom': 2, 'quite': 1, 'duck': 3, 'we': 1, 'done,': 1, 'tick.': 2, "can't": 5, 'beetles?': 1, 'well,': 2, 'box.': 4, "That's": 4, 'Do,': 1, 'say': 4, 'chicks': 5, '...': 1, 'enough,': 1, 'brick': 1, 'lot': 1, 'You': 4, 'sick': 2, 'that': 1, 'goo.': 4, 'Gooey': 1, 'made': 3, 'new': 5, 'noodles...': 1, 'Knox,': 6, 'for': 2, 'muddle.': 1, 'Bricks': 1, 'Luck': 4, 'Bim': 5, 'minute,': 1, 'brings': 2, 'bottle': 4, 'duddled': 1, "I'll": 3, 'come': 2, 'battles': 1, 'clocks': 2, 'such': 2, 'Then': 1, 'in': 19, 'sir....': 1, 'Two': 1, 'Knox.': 2, "Luke's": 1, 'lakes.': 4, 'trees': 3, "isn't": 2, 'band!': 4, 'our': 1, 'And': 2, 'blubber!': 1, 'another': 1, 'sews': 9, "bottle's": 1, "Crow's": 3, 'Step': 1, 'What': 1, 'grows': 1, 'like': 1, 'ticks': 2, 'too': 1, 'trick': 4, 'Fox,': 3, 'goo': 2, 'chewing!': 1, 'blocks': 3, 'fleas': 3, 'a': 24, 'lakes': 2, "don't": 2, 'those': 1, 'Luke': 4, 'sorry,': 1, 'tocks,': 2, 'Whose': 1, 'you': 3, 'Here': 1, 'tricks': 2, "poodle's": 1, 'they': 3, 'that.': 1, 'doing.': 1, 'Gluey.': 2, 'eating': 1, 'sir!': 1, 'breeze': 2, 'My': 4, 'tweetle': 11, 'these': 5, 'puddle,': 2, 'chewy': 1, 'tongue': 3, 'talk': 1, 'with': 11, 'beetles': 6, 'noodle': 2, 'make': 5, 'who': 1, 'lame,': 1, 'flew.': 1, "I'm": 1, 'Fox!': 2, 'Nose': 1, 'the': 7, 'I': 9, "crow's": 2, 'Thank': 1, 'easy': 2, 'likes.': 2, 'battle': 7, 'licks': 4, 'goes.': 1, 'socks': 4, 'lead': 1, 'muddle': 1, 'shame,': 1, 'Please,': 1, 'fight,': 1, 'fun,': 1, 'chew,': 2, 'fuddled': 1, 'Broom': 1, 'No,': 1, 'Hose': 1, 'something': 2, 'find': 3, 'know': 1, 'Who': 4, 'call...': 1, 'First,': 1, 'Gooey.': 2, 'Look,': 2, 'fight': 1, 'This': 1, "Luck's": 1, 'poor': 2, 'now.': 6, 'freeze.': 2, 'game': 4, "Ben's": 5, 'it!': 2, 'Joe': 5, 'their': 2, 'you,': 1, 'Box': 1, 'bands.': 2, 'it': 3, 'bands': 1, 'bricks': 5, "here's": 1, "Let's": 3, 'Sue': 5, 'when': 2, 'clocks,': 2, 'breaks.': 2, 'puddle': 8, 'Socks': 4, 'sir,': 6, 'an': 2, "Bim's": 5, 'Pig': 2, 'now....': 1, 'battle.': 4, 'Slow': 5, 'sew': 2, 'blew.': 1, 'bring': 1, 'game,': 1, 'AND...': 3, 'and': 16, 'brooms.': 1, 'way.': 2, 'booms.': 1, 'lots': 1, 'clock': 1, 'comes.': 4, 'please....': 1, 'then...': 1, '...they': 2, 'say....': 1, 'beetle': 7, 'nose.': 1, 'slow,': 1, 'or': 1, 'Six': 2, 'AND': 1, 'block': 1, 'broom.': 4, 'do': 6, 'it,': 1, 'some.': 2, 'Duck': 1, 'sir?': 2, 'grows.': 1, 'this,': 1, 'Very': 2, 'Big': 2, 'whose': 3, 'noodle-eating': 1, 'chew': 2, 'choose': 2, 'Mr.': 13, 'band': 2, "Here's": 2, 'it.': 2, 'call': 3, 'dumb': 1, 'have': 2, 'so': 2, 'Goo-Goose': 1, 'say.': 2, 'socks.': 5, "trees'": 1, 'poodle': 3, 'socks,': 4, 'my': 1, 'While': 1, 'play.': 2, 'Chicks': 3, 'stack.': 4, 'rose': 2, 'freezy': 1, 'clothes.': 3, 'makes': 1, 'little': 1, 'paddles': 3, 'box': 2, 'all': 1, 'free': 2, 'blocks,': 1, 'Do': 1, 'blab': 1, 'THIS': 1, 'thing': 1, 'bends': 2, 'bent': 2, 'Knox': 8, 'socks?': 2, 'tock.': 2, 'wuddled': 1, 'much': 1, 'takes': 2, 'bends.': 2, 'wait': 1, 'see': 1, 'rubber.': 1, 'of': 4, 'clothes?': 2, 'mouth': 3, 'bottle...': 1, 'too,': 1, 'blibber': 1, 'Try': 2, 'where': 1, "won't": 2, 'get': 1}

Use the a Counter from collections library.
Since the values you want to count are values from your word_dict (i.e. the frequencies of each word). You'll need to initialize the Counter instance like freq = Counter(word_dict.values()). Then you can extract the x and y series for your plot with c.keys() and c.values.

It seems as though you are attempting to plot strings along your x-axis, namely the keys you are using. This is not how pyplot works. You need to plot your values against a numeric vector (typically a numpy array). Once you have done this you can relabel your independent (x) vector using the xticks command.
x = numpy.linspace(0,len(new_dict.keys)-1,len(new_dict.keys))
pyplot.xticks(x, new_dict.keys)

Assuming you mean reversing the key, values, you can do:
>>> di={'bangs': 1, 'sees': 1, 'stuff,': 1, 'Knox....': 1, 'Well': 1, 'about': 2, 'your': 1, 'blocks.': 1, 'what': 4, 'beetles....': 1, 'Boom': 1, 'blue': 1, 'paddled': 1, 'mixed': 1, 'fox': 5, 'Through': 1, 'on': 16, 'trick,': 2, 'When': 4, '...a': 1, 'silly': 1, 'band.': 2, 'come.': 3, "We'll": 2, 'likes': 2, 'slick,': 1, 'comes?': 1, 'chick': 1, 'goo,': 1, "it's": 2, 'then,': 1, 'muddled': 1, 'Now': 3, 'not': 1, 'flew,': 1, 'If,': 1, 'sneeze.': 1, 'bottled': 1, 'paddle': 4, 'called': 1, 'Goo-Goose,': 1, 'Blue': 2, 'Come': 1, 'fox.': 1, 'can': 3, 'poodle,': 1, 'this': 7, "Sue's": 4, 'Ben': 5, 'is': 7, 'goes': 1, 'to': 10, 'Crow': 4, 'cheese': 2, 'quick': 5, 'sir.': 27, 'easy,': 1, 'Clocks': 2, 'Fox': 6, 'Stop': 2, 'up': 1, 'be': 1, 'Well...': 1, 'hose': 2, 'Rose': 1, 'three': 4, 'Freezy': 2, 'New': 3, 'hate': 1, 'broom': 2, 'quite': 1, 'duck': 3, 'we': 1, 'done,': 1, 'tick.': 2, "can't": 5, 'beetles?': 1, 'well,': 2, 'box.': 4, "That's": 4, 'Do,': 1, 'say': 4, 'chicks': 5, '...': 1, 'enough,': 1, 'brick': 1, 'lot': 1, 'You': 4, 'sick': 2, 'that': 1, 'goo.': 4, 'Gooey': 1, 'made': 3, 'new': 5, 'noodles...': 1, 'Knox,': 6, 'for': 2, 'muddle.': 1, 'Bricks': 1, 'Luck': 4, 'Bim': 5, 'minute,': 1, 'brings': 2, 'bottle': 4, 'duddled': 1, "I'll": 3, 'come': 2, 'battles': 1, 'clocks': 2, 'such': 2, 'Then': 1, 'in': 19, 'sir....': 1, 'Two': 1, 'Knox.': 2, "Luke's": 1, 'lakes.': 4, 'trees': 3, "isn't": 2, 'band!': 4, 'our': 1, 'And': 2, 'blubber!': 1, 'another': 1, 'sews': 9, "bottle's": 1, "Crow's": 3, 'Step': 1, 'What': 1, 'grows': 1, 'like': 1, 'ticks': 2, 'too': 1, 'trick': 4, 'Fox,': 3, 'goo': 2, 'chewing!': 1, 'blocks': 3, 'fleas': 3, 'a': 24, 'lakes': 2, "don't": 2, 'those': 1, 'Luke': 4, 'sorry,': 1, 'tocks,': 2, 'Whose': 1, 'you': 3, 'Here': 1, 'tricks': 2, "poodle's": 1, 'they': 3, 'that.': 1, 'doing.': 1, 'Gluey.': 2, 'eating': 1, 'sir!': 1, 'breeze': 2, 'My': 4, 'tweetle': 11, 'these': 5, 'puddle,': 2, 'chewy': 1, 'tongue': 3, 'talk': 1, 'with': 11, 'beetles': 6, 'noodle': 2, 'make': 5, 'who': 1, 'lame,': 1, 'flew.': 1, "I'm": 1, 'Fox!': 2, 'Nose': 1, 'the': 7, 'I': 9, "crow's": 2, 'Thank': 1, 'easy': 2, 'likes.': 2, 'battle': 7, 'licks': 4, 'goes.': 1, 'socks': 4, 'lead': 1, 'muddle': 1, 'shame,': 1, 'Please,': 1, 'fight,': 1, 'fun,': 1, 'chew,': 2, 'fuddled': 1, 'Broom': 1, 'No,': 1, 'Hose': 1, 'something': 2, 'find': 3, 'know': 1, 'Who': 4, 'call...': 1, 'First,': 1, 'Gooey.': 2, 'Look,': 2, 'fight': 1, 'This': 1, "Luck's": 1, 'poor': 2, 'now.': 6, 'freeze.': 2, 'game': 4, "Ben's": 5, 'it!': 2, 'Joe': 5, 'their': 2, 'you,': 1, 'Box': 1, 'bands.': 2, 'it': 3, 'bands': 1, 'bricks': 5, "here's": 1, "Let's": 3, 'Sue': 5, 'when': 2, 'clocks,': 2, 'breaks.': 2, 'puddle': 8, 'Socks': 4, 'sir,': 6, 'an': 2, "Bim's": 5, 'Pig': 2, 'now....': 1, 'battle.': 4, 'Slow': 5, 'sew': 2, 'blew.': 1, 'bring': 1, 'game,': 1, 'AND...': 3, 'and': 16, 'brooms.': 1, 'way.': 2, 'booms.': 1, 'lots': 1, 'clock': 1, 'comes.': 4, 'please....': 1, 'then...': 1, '...they': 2, 'say....': 1, 'beetle': 7, 'nose.': 1, 'slow,': 1, 'or': 1, 'Six': 2, 'AND': 1, 'block': 1, 'broom.': 4, 'do': 6, 'it,': 1, 'some.': 2, 'Duck': 1, 'sir?': 2, 'grows.': 1, 'this,': 1, 'Very': 2, 'Big': 2, 'whose': 3, 'noodle-eating': 1, 'chew': 2, 'choose': 2, 'Mr.': 13, 'band': 2, "Here's": 2, 'it.': 2, 'call': 3, 'dumb': 1, 'have': 2, 'so': 2, 'Goo-Goose': 1, 'say.': 2, 'socks.': 5, "trees'": 1, 'poodle': 3, 'socks,': 4, 'my': 1, 'While': 1, 'play.': 2, 'Chicks': 3, 'stack.': 4, 'rose': 2, 'freezy': 1, 'clothes.': 3, 'makes': 1, 'little': 1, 'paddles': 3, 'box': 2, 'all': 1, 'free': 2, 'blocks,': 1, 'Do': 1, 'blab': 1, 'THIS': 1, 'thing': 1, 'bends': 2, 'bent': 2, 'Knox': 8, 'socks?': 2, 'tock.': 2, 'wuddled': 1, 'much': 1, 'takes': 2, 'bends.': 2, 'wait': 1, 'see': 1, 'rubber.': 1, 'of': 4, 'clothes?': 2, 'mouth': 3, 'bottle...': 1, 'too,': 1, 'blibber': 1, 'Try': 2, 'where': 1, "won't": 2, 'get': 1}
new_di={}
for k, v in di.items():
new_di.setdefault(v, []).append(k)
>>> new_di
{1: ['What', 'game,', 'Whose', 'Thank', 'Broom', 'goo,', 'bring', 'fuddled', 'hate', 'Hose', 'then,', 'sneeze.', 'Here', 'sir....', 'Please,', '...', 'it,', 'get', 'Goo-Goose', 'bands', 'muddle', 'nose.', 'Goo-Goose,', 'sorry,', 'not', "I'm", 'little', 'No,', 'like', 'THIS', 'poodle,', 'Knox....', 'Bricks', 'blibber', 'chick', 'where', 'Rose', 'see', 'noodle-eating', 'call...', 'fun,', 'blue', 'chewing!', 'clock', 'lots', 'slow,', 'sir!', 'chewy', 'goes', 'beetles?', 'Do', 'goes.', 'flew.', 'Box', 'be', 'we', 'eating', 'this,', 'stuff,', "poodle's", 'Duck', 'Well...', 'then...', 'quite', 'minute,', 'Step', 'doing.', 'wait', 'brooms.', 'bottle...', 'thing', 'bangs', 'mixed', 'fight,', 'makes', 'or', 'grows.', 'duddled', 'all', 'too,', 'Two', 'Gooey', 'Boom', 'another', 'If,', 'done,', 'your', '...a', 'First,', 'now....', 'fight', 'muddle.', "trees'", 'too', 'lot', 'enough,', 'blew.', 'brick', 'This', 'Come', 'easy,', 'that', 'Well', "Luke's", 'those', "here's", 'say....', 'up', 'you,', 'freezy', 'silly', 'flew,', 'wuddled', 'dumb', 'my', 'called', 'lame,', 'sees', 'Do,', 'comes?', "Luck's", 'blubber!', 'rubber.', 'shame,', 'paddled', 'Then', 'blab', 'battles', 'booms.', 'bottled', 'please....', 'Through', 'grows', 'muddled', 'that.', 'our', 'who', 'much', 'slick,', 'Nose', 'blocks,', "bottle's", 'While', 'beetles....', 'noodles...', 'lead', 'fox.', 'AND', 'blocks.', 'block', 'talk', 'know'], 2: ['Blue', "don't", 'choose', 'clocks', 'band.', 'tock.', 'Big', 'broom', 'some.', "crow's", 'easy', 'it.', 'it!', 'Try', 'tocks,', 'Pig', 'Clocks', "isn't", 'likes', 'sew', 'chew', 'bends', 'Very', 'box', 'puddle,', 'Knox.', 'band', 'Six', 'for', 'ticks', '...they', "Here's", 'hose', 'And', 'free', 'say.', 'come', 'about', 'chew,', 'likes.', 'Freezy', 'way.', 'tick.', 'rose', 'cheese', 'bent', 'takes', 'their', "it's", "We'll", 'Fox!', 'brings', 'noodle', 'clocks,', 'Gooey.', 'Gluey.', 'sir?', 'when', 'breaks.', 'have', 'an', 'well,', 'something', 'clothes?', 'bends.', 'Stop', 'trick,', 'sick', 'poor', "won't", 'bands.', 'goo', 'play.', 'socks?', 'such', 'tricks', 'freeze.', 'breeze', 'so', 'lakes', 'Look,'], 3: ['find', 'Now', 'mouth', 'trees', 'they', 'Chicks', 'fleas', 'New', 'come.', 'whose', 'AND...', 'tongue', 'poodle', 'duck', 'call', 'Fox,', "I'll", 'made', 'can', 'paddles', 'it', 'clothes.', "Let's", 'you', 'blocks', "Crow's"], 4: ['goo.', 'band!', 'game', 'socks', 'battle.', 'My', 'lakes.', 'broom.', 'what', 'paddle', "Sue's", 'of', 'When', 'Socks', 'three', 'box.', 'licks', "That's", 'trick', 'socks,', 'say', 'comes.', 'You', 'stack.', 'Luke', 'Who', 'Luck', 'Crow', 'bottle'], 5: ['chicks', 'Bim', 'quick', 'Sue', 'fox', 'Joe', 'new', "Bim's", "can't", 'bricks', 'socks.', "Ben's", 'Ben', 'Slow', 'make', 'these'], 6: ['Fox', 'Knox,', 'do', 'now.', 'sir,', 'beetles'], 7: ['beetle', 'battle', 'this', 'is', 'the'], 8: ['Knox', 'puddle'], 9: ['sews', 'I'], 10: ['to'], 11: ['tweetle', 'with'], 13: ['Mr.'], 16: ['and', 'on'], 19: ['in'], 24: ['a'], 27: ['sir.']}

I'm not sure what you used for tokenizing your data, but a quick solution could be using nltk.
Here is a small example on how it can be done:
# necessary imports
from nltk import FreqDist # used later to plot and get count
from nltk.tokenize import word_tokenize # tokenizes our sentence by word
# sample text
text = 'this is a super long text, that has some random words in it. It is not really
that long, but could be very long.'
tknz = word_tokenize(text) # tokenizes the text into ('this', 'is',...)
fdist = FreqDist(tknz) # creates frequency distribution from the tokenized words
From that you can simply do fdis.plot() which gives:
From here you have a matplotlib plot that you can edit, and it only took a few lines to obtain.
You can find additional information about FreqDist here. It also behaves like a dictionary:
>>> fdist.items()
dict_items([(',', 2), ('in', 1), ('a', 1), ('very', 1), ('really', 1), ('be', 1), ...])

how to make the values of dictionary appear in alphabetical order

Let's say I have this dictionary:
{'song': 1, 'like': 1, 'most': 1, 'neer': 1, 'hides': 1, 'live': 1, 'yours': 1, 'come': 2, 'not': 1, 'rage': 1, 'deserts': 1, 'be': 2, 'graces': 1, 'metre': 1, 'rights': 1, 'tomb': 1, 'stretched': 1, 'verse': 1, 'write': 1, 'the': 2, 'beauty': 1, 'all': 1, 'should': 2, 'it': 3, 'rhyme': 1, 'is': 1, 'this': 1, 'in': 4, 'earthly': 1, 'numbers': 1, 'to': 2, 'if': 2, 'my': 3, 'yet': 1, 'less': 1, 'would': 1, 'life': 1, 'an': 1, 'alive': 1, 'number': 1, 'a': 2, 'child': 1, 'say': 1, 'tongue': 1, 'heavenly': 1, 'knows': 1, 'men': 1, 'could': 1, 'half': 1, 'so': 1, 'parts': 1, 'their': 1, 'high': 1, 'with': 2, 'believe': 1, 'such': 1, 'that': 1, 'papers': 1, 'eyes': 1, 'antique': 1, 'age': 2, 'were': 2, 'fresh': 1, 'lies': 1, 'than': 1, 'poet': 1, 'termed': 1, 'old': 1, 'touches': 1, 'and': 5, 'but': 2, 'some': 1, 'of': 4, 'time': 2, 'touched': 1, 'twice': 1, 'will': 1, 'yellowed': 1, 'you': 1, 'though': 1, 'heaven': 1, 'poets': 1, 'truth': 1, 'who': 1, 'i': 1, 'faces': 1, 'which': 1, 'scorned': 1, 'shows': 1, 'filled': 1, 'your': 6, 'true': 1, 'as': 1}
How do I go by making each key-value pair ordered alphabetically by the key?
I tried doing:
for key,value in sorted(freqs.items()):
freqs[key]=value
but that doesn't do anything. I want it to look like this:
ab 5
and 8
...
yours 2

Dicts are not sorted data structures, but you can traverse them in a sorted manner using:
for key in sorted(freqs.keys()):
print freqs[key]

collections.OrderedDict is for this purpose. Example:
>>> # regular unsorted dictionary
>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}
>>> # dictionary sorted by key
>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])

Dictionaries in Python are unsorted by nature. If you were to call a dictionary twice in two different places, you can expect them to be in a different order. Unless I am understanding something wrong?

Try one of these:
https://pypi.python.org/pypi/treap
https://pypi.python.org/pypi/red-black-tree-mod
The treap is a hybrid of a tree and a heap. It works like a sorted (by key) dictionary.
The red-black tree is a tree. It also works like a sorted, by key, dictionary.
Some say treaps are faster than red-black trees on average, but that treaps have a greater standard deviation in operation times.
Both of them do almost everything in O(logn) time, except sorting. They both keep everything sorted by key, nonstop.
Sometimes it's better to sort the keys of a standard dictionary, but it's rarely a good idea to sort inside a loop.

Creating an ARFF file from python output

gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html': {'dail': 1, 'focus': 1, 'actions': 1, 'trade': 2, 'protest': 1, 'identify': 1, 'previous': 1, 'detectives': 1, 'republican': 1, 'group': 1, 'monitor': 1, 'clashes': 1, 'civil': 1, 'charge': 1, 'breaches': 1, 'travelling': 1, 'main': 1, 'disrupt': 1, 'real': 1, 'policing': 3, 'march': 6, 'finance': 1, 'drawn': 1, 'assistant': 1, 'protesters': 1, 'emphasised': 1, 'department': 1, 'traffic': 2, 'outbreak': 1, 'culprits': 1, 'proportionate': 1, 'instructions': 1, 'warned': 2, 'commanders': 1, 'michael': 2, 'exploit': 1, 'culminating': 1, 'large': 2, 'continue': 1, 'team': 1, 'hijack': 1, 'disorder': 1, 'square': 1, 'leaders': 1, 'deal': 2, 'people': 3, 'streets': 1, 'demonstrations': 2, 'observed': 1, 'street': 2, 'college': 1, 'organised': 1, 'operation': 1, 'special': 1, 'shown': 1, 'attendance': 1, 'normal': 1, 'unions': 2, 'individuals': 1, 'safety': 2, 'prosecuted': 1, 'ira': 1, 'ground': 1, 'public': 2, 'told': 1, 'body': 1, 'stewards': 2, 'obey': 1, 'business': 1, 'gathered': 1, 'assemble': 1, 'garda': 5, 'sinn': 1, 'broken': 1, 'fachtna': 1, 'management': 2, 'possibility': 1, 'groups': 3, 'put': 1, 'affiliated': 1, 'strong': 2, 'security': 1, 'stage': 1, 'behaviour': 1, 'involved': 1, 'route': 2, 'violence': 1, 'dublin': 3, 'fein': 1, 'ensure': 2, 'stand': 1, 'act': 2, 'contingency': 1, 'troublemakers': 2, 'facilitate': 2, 'road': 1, 'members': 1, 'prepared': 1, 'presence': 1, 'sullivan': 2, 'reassure': 1, 'number': 3, 'community': 1, 'strategic': 1, 'visible': 2, 'addressed': 1, 'notify': 1, 'trained': 1, 'eirigi': 1, 'city': 4, 'gpo': 1, 'from': 3, 'crowd': 1, 'visit': 1, 'wood': 1, 'editor': 1, 'peaceful': 4, 'expected': 2, 'today': 1, 'commissioner': 4, 'quay': 1, 'ictu': 1, 'advance': 1, 'murphy': 2, 'gardai': 6, 'aware': 1, 'closures': 1, 'courts': 1, 'branch': 1, 'deployed': 1, 'made': 1, 'thousands': 1, 'socialist': 1, 'work': 1, 'supt': 2, 'feehan': 1, 'mr': 1, 'briefing': 1, 'visited': 1, 'manner': 1, 'irish': 2, 'metropolitan': 1, 'spotters': 1, 'organisers': 1, 'in': 13, 'dissident': 1, 'evidence': 1, 'tom': 1, 'arrangements': 3, 'experience': 1, 'allowed': 1, 'sought': 1, 'rally': 1, 'connell': 1, 'officers': 3, 'potential': 1, 'holding': 1, 'units': 1, 'place': 2, 'events': 1, 'dignified': 1, 'planned': 1, 'independent': 1, 'added': 2, 'plans': 1, 'congress': 1, 'centre': 3, 'comprehensive': 1, 'measures': 1, 'yesterday': 2, 'alert': 1, 'important': 1, 'moving': 1, 'plan': 2, 'highly': 1, 'law': 2, 'senior': 2, 'fair': 1, 'recent': 1, 'refuse': 1, 'attempt': 1, 'brady': 1, 'liaising': 1, 'conscious': 1, 'light': 1, 'clear': 1, 'headquarters': 1, 'wing': 1, 'chief': 2, 'maintain': 1, 'harcourt': 1, 'order': 2, 'left': 1}}
I have a python script that extracts words from text files and counts the number of times they occur in the file.
I want to add them to an ".ARFF" file to use for weka classification.
Above is an example output of my python script.
How do I go about inserting them into an ARFF file, keeping each text file separate. Each file is differentiated by {"with their words in here!!"}

I know it's pretty easy to generate an arff file on your own, but I still wanted to make it simpler so I wrote a python package
https://github.com/ubershmekel/arff
It's also on pypi so easy_install arff

There are details on the ARFF file format here and it's very simple to generate. For example, using a cut-down version of your Python dictionary, the following script:
import re
d = { 'gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html':
{'dail': 1,
'focus': 1,
'actions': 1,
'trade': 2,
'protest': 1,
'identify': 1 }}
for original_filename in d.keys():
m = re.search('^(.*)\.html$',original_filename,)
if not m:
print "Ignoring the file:", original_filename
continue
output_filename = m.group(1)+'.arff'
with open(output_filename,"w") as fp:
fp.write('''#RELATION wordcounts
#ATTRIBUTE word string
#ATTRIBUTE count numeric
#DATA
''')
for word_and_count in d[original_filename].items():
fp.write("%s,%d\n" % word_and_count)
Generates output of the form:
#RELATION wordcounts
#ATTRIBUTE word string
#ATTRIBUTE count numeric
#DATA
dail,1
focus,1
actions,1
trade,2
protest,1
identify,1
... in a file called gardai-plan-crackdown-on-troublemakers-at-protest-2438316.arff. If that's not exactly what you want, I'm sure you can easily alter it. (For example, if the "words" might have spaces or other punctuation in them, you probably want to quote them.)

This project seems to be a bit more up to date. You can install it via
pip:
$ pip install liac-arff
or easy_install:
$ easy_install liac-arff

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Complex dictionary sorting - python

Related

Sort the keys of a dictionary by key using a list and for loop [duplicate]

How to compute the words given in a dict and compute if the word is on a premium tile?

Python Pyplot word occurrence frequency

how to make the values of dictionary appear in alphabetical order

Creating an ARFF file from python output

Categories

Resources