This question already has answers here:
convert a dict to sorted dict in python
(2 answers)
Closed 2 years ago.
I need to sort this dictionary that counts the times that some words appear in a song:
word_freq = {'love': 25, 'conversation': 1, 'every': 6, "we're": 1, 'plate': 1, 'sour': 1, 'jukebox': 1, 'now': 11, 'taxi': 1, 'fast': 1, 'bag': 1, 'man': 1, 'push': 3, 'baby': 14, 'going': 1, 'you': 16, "don't": 2, 'one': 1, 'mind': 2, 'backseat': 1, 'friends': 1, 'then': 3, 'know': 2, 'take': 1, 'play': 1, 'okay': 1, 'so': 2, 'begin': 1, 'start': 2, 'over': 1, 'body': 17, 'boy': 2, 'just': 1, 'we': 7, 'are': 1, 'girl': 2, 'tell': 1, 'singing': 2, 'drinking': 1, 'put': 3, 'our': 1, 'where': 1, "i'll": 1, 'all': 1, "isn't": 1, 'make': 1, 'lover': 1, 'get': 1, 'radio': 1, 'give': 1, "i'm": 23, 'like': 10, 'can': 1, 'doing': 2, 'with': 22, 'club': 1, 'come': 37, 'it': 1, 'somebody': 2, 'handmade': 2, 'out': 1, 'new': 6, 'room': 3, 'chance': 1, 'follow': 6, 'in': 27, 'may': 2, 'brand': 6, 'that': 2, 'magnet': 3, 'up': 3, 'first': 1, 'and': 23, 'pull': 3, 'of': 6, 'table': 1, 'much': 2, 'last': 3, 'i': 6, 'thrifty': 1, 'grab': 2, 'was': 2, 'driver': 1, 'slow': 1, 'dance': 1, 'the': 18, 'say': 2, 'trust': 1, 'family': 1, 'week': 1, 'date': 1, 'me': 10, 'do': 3, 'waist': 2, 'smell': 3, 'day': 6, 'although': 3, 'your': 21, 'leave': 1, 'want': 2, "let's": 2, 'lead': 6, 'at': 1, 'hand': 1, 'how': 1, 'talk': 4, 'not': 2, 'eat': 1, 'falling': 3, 'about': 1, 'story': 1, 'sweet': 1, 'best': 1, 'crazy': 2, 'let': 1, 'too': 5, 'van': 1, 'shots': 1, 'go': 2, 'to': 2, 'a': 8, 'my': 33, 'is': 5, 'place': 1, 'find': 1, 'shape': 6, 'on': 40, 'kiss': 1, 'were': 3, 'night': 3, 'heart': 3, 'for': 3, 'discovering': 6, 'something': 6, 'be': 16, 'bedsheets': 3, 'fill': 2, 'hours': 2, 'stop': 1, 'bar': 1}
In order to do it I need:
To create a new list just with the keys of the dictionary.
keys = list(word_freq.keys())
Sort the key list.
keys.sort()
Create an empty dictionary.
word_freq2 = {}
Use a for loop lo iterate each value of the list. For each iterated, find the corresponding value in the first dictionary and insert the key-value pair to the new empty dictionary.
This is my best solution up to now:
for key in keys:
if key in word_freq:
word_freq2.update({key: value})
print(word_freq2)
The problem is that I don't know how to add the correct value because right know I receive just 1 as a value, as I show here:
{'a': 1, 'about': 1, 'all': 1, 'although': 1, 'and': 1, 'are': 1, 'at': 1, 'baby': 1, 'backseat': 1, 'bag': 1, 'bar': 1, 'be': 1, 'bedsheets': 1, 'begin': 1, 'best': 1, 'body': 1, 'boy': 1, 'brand': 1, 'can': 1, 'chance': 1, 'club': 1, 'come': 1, 'conversation': 1, 'crazy': 1, 'dance': 1, 'date': 1, 'day': 1, 'discovering': 1, 'do': 1, 'doing': 1, "don't": 1, 'drinking': 1, 'driver': 1, 'eat': 1, 'every': 1, 'falling': 1, 'family': 1, 'fast': 1, 'fill': 1, 'find': 1, 'first': 1, 'follow': 1, 'for': 1, 'friends': 1, 'get': 1, 'girl': 1, 'give': 1, 'go': 1, 'going': 1, 'grab': 1, 'hand': 1, 'handmade': 1, 'heart': 1, 'hours': 1, 'how': 1, 'i': 1, "i'll": 1, "i'm": 1, 'in': 1, 'is': 1, "isn't": 1, 'it': 1, 'jukebox': 1, 'just': 1, 'kiss': 1, 'know': 1, 'last': 1, 'lead': 1, 'leave': 1, 'let': 1, "let's": 1, 'like': 1, 'love': 1, 'lover': 1, 'magnet': 1, 'make': 1, 'man': 1, 'may': 1, 'me': 1, 'mind': 1, 'much': 1, 'my': 1, 'new': 1, 'night': 1, 'not': 1, 'now': 1, 'of': 1, 'okay': 1, 'on': 1, 'one': 1, 'our': 1, 'out': 1, 'over': 1, 'place': 1, 'plate': 1, 'play': 1, 'pull': 1, 'push': 1, 'put': 1, 'radio': 1, 'room': 1, 'say': 1, 'shape': 1, 'shots': 1, 'singing': 1, 'slow': 1, 'smell': 1, 'so': 1, 'somebody': 1, 'something': 1, 'sour': 1, 'start': 1, 'stop': 1, 'story': 1, 'sweet': 1, 'table': 1, 'take': 1, 'talk': 1, 'taxi': 1, 'tell': 1, 'that': 1, 'the': 1, 'then': 1, 'thrifty': 1, 'to': 1, 'too': 1, 'trust': 1, 'up': 1, 'van': 1, 'waist': 1, 'want': 1, 'was': 1, 'we': 1, "we're": 1, 'week': 1, 'were': 1, 'where': 1, 'with': 1, 'you': 1, 'your': 1}
This code seems to work just fine:
word_freq = {'love': 25, 'conversation': 1, 'every': 6, "we're": 1, 'plate': 1, 'sour': 1, 'jukebox': 1, 'now': 11, 'taxi': 1, 'fast': 1, 'bag': 1, 'man': 1, 'push': 3, 'baby': 14, 'going': 1, 'you': 16, "don't": 2, 'one': 1, 'mind': 2, 'backseat': 1, 'friends': 1, 'then': 3, 'know': 2, 'take': 1, 'play': 1, 'okay': 1, 'so': 2, 'begin': 1, 'start': 2, 'over': 1, 'body': 17, 'boy': 2, 'just': 1, 'we': 7, 'are': 1, 'girl': 2, 'tell': 1, 'singing': 2, 'drinking': 1, 'put': 3, 'our': 1, 'where': 1, "i'll": 1, 'all': 1, "isn't": 1, 'make': 1, 'lover': 1, 'get': 1, 'radio': 1, 'give': 1, "i'm": 23, 'like': 10, 'can': 1, 'doing': 2, 'with': 22, 'club': 1, 'come': 37, 'it': 1, 'somebody': 2, 'handmade': 2, 'out': 1, 'new': 6, 'room': 3, 'chance': 1, 'follow': 6, 'in': 27, 'may': 2, 'brand': 6, 'that': 2, 'magnet': 3, 'up': 3, 'first': 1, 'and': 23, 'pull': 3, 'of': 6, 'table': 1, 'much': 2, 'last': 3, 'i': 6, 'thrifty': 1, 'grab': 2, 'was': 2, 'driver': 1, 'slow': 1, 'dance': 1, 'the': 18, 'say': 2, 'trust': 1, 'family': 1, 'week': 1, 'date': 1, 'me': 10, 'do': 3, 'waist': 2, 'smell': 3, 'day': 6, 'although': 3, 'your': 21, 'leave': 1, 'want': 2, "let's": 2, 'lead': 6, 'at': 1, 'hand': 1, 'how': 1, 'talk': 4, 'not': 2, 'eat': 1, 'falling': 3, 'about': 1, 'story': 1, 'sweet': 1, 'best': 1, 'crazy': 2, 'let': 1, 'too': 5, 'van': 1, 'shots': 1, 'go': 2, 'to': 2, 'a': 8, 'my': 33, 'is': 5, 'place': 1, 'find': 1, 'shape': 6, 'on': 40, 'kiss': 1, 'were': 3, 'night': 3, 'heart': 3, 'for': 3, 'discovering': 6, 'something': 6, 'be': 16, 'bedsheets': 3, 'fill': 2, 'hours': 2, 'stop': 1, 'bar': 1}
keys = list(word_freq.keys())
keys.sort()
word_freq2 = {}
for key in keys:
word_freq2[key] = word_freq[key]
print(word_freq2)
I am currently working on a project that computes the score in a scrabble game. I am stuck on the part that could calculate the points of a word and compute if the word is on the premium tile. This is the code I have done so far
words_dict = {'CART': [(6, 2), (6, 3), (6, 4), (6, 5)], 'THIEF': [(6, 5), (7, 5), (8, 5), (9, 5), (10, 5)], 'HORN': [(7, 5), (7, 6), (7, 7), (7, 8)]}
Next to the words are the coordinates for every word that was found in a numpy array.
premium_tiles = [[ 3, 1, 1, 2, 1, 1, 1, 3, 1, 1, 1, 2, 1, 1, 3 ],
[ 1, 2, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 2, 1 ],
[ 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1 ],
[ 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2 ],
[ 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1 ],
[ 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1 ],
[ 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1 ],
[ 3, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 3 ],
[ 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1 ],
[ 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1 ],
[ 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1 ],
[ 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2 ],
[ 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1 ],
[ 1, 2, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 2, 1 ],
[ 3, 1, 1, 2, 1, 1, 1, 3, 1, 1, 1, 2, 1, 1, 3 ]]
On this array are points that I need to calculate the points in a word. The expected that I want is for example in CART since I made an array which look like this.
points = {
'A': 1, 'B': 3, 'C': 3,
'D': 2, 'E': 1, 'F': 4, 'G': 2,
'H': 4, 'I': 1, 'J': 8, 'K': 5,
'L': 1, 'M': 3, 'N': 1, 'O': 1,
'P': 3, 'Q': 10, 'R': 1, 'S': 1,
'T': 1, 'U': 1, 'V': 4, 'W': 4, 'X': 8,
'Y': 4, 'Z': 10, '#': 0
}
I've tried to do code but I can't seem to make it work. This is the code that I have done so far.
for key,value in words_dict.items():
for keys in key:
if len(keys) == 1:
for i,j in value:
if keys == key[m+1]:
sum1 += premium_tiles[i][j]*points[keys]
print(sum1)
sumsOfwords.append(sum1)
print(sumsOfwords)
In C=3, A=1, R=1, and T=1, then it will go to the premium tiles to multiply each letter in a specified coordinate. Example the coordinates in the cart is [(6, 2), (6, 3), (6, 4), (6, 5)] it will go to the array premium tiles and multiply the letters which C=3x2, A=1x1, R=1x1, and T=1x1. CART =(3x2+1x1+1x1+1x1) which is equal to 9. Can someone help pls? I am in need desperately.
I can't test this but it appears to me that you have extra blocks that don't make sense.
sumsOfwords = []
for key, value in words_dict.items():
sum1 = 0
for idx, (i, j) in enumerate(value):
sum1 += premium_tiles[i][j] * points[key[idx]]
# the multiplier loop goes here
# don't combine it with the prior loop
print(sum1)
sumsOfwords.append(sum1)
print(sumsOfwords)
On the double and triple squares: With another table like premium_tiles that has 1 on every normal square and 2 and 3 on the double and triples.
The code would look like this:
for i, j in value:
if not used[i][j]:
sum1 *= mult[i][j]
used[i][j] = True
where used and mult are tables like premium_tiles except that used starts out full of False and mult has 1 anywhere there isn't a 2 or 3.
How can I count the points if the words are the same?
The issue there is that words_dict need to change from a dictionary to a list of tuples [(word, [position])] as a dictionary only permits one entry per key.
I have to plot the occurrence of each frequency of word in a txt file. So far I have the dictionary that contains each word and the frequency that it appears in the txt file. In order to plot, I have to convert that dictionary into a new dictionary (I'm assuming) that counts the number words at each frequency. For instance, if 5 words appear 3 times in the txt file, those need to be a single dictionary grouping that will plot the frequency as the x axis and number of words at that frequency on the y axis.
What I have now is simply not working:
def plot(word_dict):
new_dict = {}
for value in word_dict.values():
if value in word_dict:
new_dict += 1
else:
new_dict = 1
y = new_dict[value]
x = word_dict[value]
pyplot.plot(x, y)
pyplot.show()
A sample of data:
{'bangs': 1, 'sees': 1, 'stuff,': 1, 'Knox....': 1, 'Well': 1, 'about': 2, 'your': 1, 'blocks.': 1, 'what': 4, 'beetles....': 1, 'Boom': 1, 'blue': 1, 'paddled': 1, 'mixed': 1, 'fox': 5, 'Through': 1, 'on': 16, 'trick,': 2, 'When': 4, '...a': 1, 'silly': 1, 'band.': 2, 'come.': 3, "We'll": 2, 'likes': 2, 'slick,': 1, 'comes?': 1, 'chick': 1, 'goo,': 1, "it's": 2, 'then,': 1, 'muddled': 1, 'Now': 3, 'not': 1, 'flew,': 1, 'If,': 1, 'sneeze.': 1, 'bottled': 1, 'paddle': 4, 'called': 1, 'Goo-Goose,': 1, 'Blue': 2, 'Come': 1, 'fox.': 1, 'can': 3, 'poodle,': 1, 'this': 7, "Sue's": 4, 'Ben': 5, 'is': 7, 'goes': 1, 'to': 10, 'Crow': 4, 'cheese': 2, 'quick': 5, 'sir.': 27, 'easy,': 1, 'Clocks': 2, 'Fox': 6, 'Stop': 2, 'up': 1, 'be': 1, 'Well...': 1, 'hose': 2, 'Rose': 1, 'three': 4, 'Freezy': 2, 'New': 3, 'hate': 1, 'broom': 2, 'quite': 1, 'duck': 3, 'we': 1, 'done,': 1, 'tick.': 2, "can't": 5, 'beetles?': 1, 'well,': 2, 'box.': 4, "That's": 4, 'Do,': 1, 'say': 4, 'chicks': 5, '...': 1, 'enough,': 1, 'brick': 1, 'lot': 1, 'You': 4, 'sick': 2, 'that': 1, 'goo.': 4, 'Gooey': 1, 'made': 3, 'new': 5, 'noodles...': 1, 'Knox,': 6, 'for': 2, 'muddle.': 1, 'Bricks': 1, 'Luck': 4, 'Bim': 5, 'minute,': 1, 'brings': 2, 'bottle': 4, 'duddled': 1, "I'll": 3, 'come': 2, 'battles': 1, 'clocks': 2, 'such': 2, 'Then': 1, 'in': 19, 'sir....': 1, 'Two': 1, 'Knox.': 2, "Luke's": 1, 'lakes.': 4, 'trees': 3, "isn't": 2, 'band!': 4, 'our': 1, 'And': 2, 'blubber!': 1, 'another': 1, 'sews': 9, "bottle's": 1, "Crow's": 3, 'Step': 1, 'What': 1, 'grows': 1, 'like': 1, 'ticks': 2, 'too': 1, 'trick': 4, 'Fox,': 3, 'goo': 2, 'chewing!': 1, 'blocks': 3, 'fleas': 3, 'a': 24, 'lakes': 2, "don't": 2, 'those': 1, 'Luke': 4, 'sorry,': 1, 'tocks,': 2, 'Whose': 1, 'you': 3, 'Here': 1, 'tricks': 2, "poodle's": 1, 'they': 3, 'that.': 1, 'doing.': 1, 'Gluey.': 2, 'eating': 1, 'sir!': 1, 'breeze': 2, 'My': 4, 'tweetle': 11, 'these': 5, 'puddle,': 2, 'chewy': 1, 'tongue': 3, 'talk': 1, 'with': 11, 'beetles': 6, 'noodle': 2, 'make': 5, 'who': 1, 'lame,': 1, 'flew.': 1, "I'm": 1, 'Fox!': 2, 'Nose': 1, 'the': 7, 'I': 9, "crow's": 2, 'Thank': 1, 'easy': 2, 'likes.': 2, 'battle': 7, 'licks': 4, 'goes.': 1, 'socks': 4, 'lead': 1, 'muddle': 1, 'shame,': 1, 'Please,': 1, 'fight,': 1, 'fun,': 1, 'chew,': 2, 'fuddled': 1, 'Broom': 1, 'No,': 1, 'Hose': 1, 'something': 2, 'find': 3, 'know': 1, 'Who': 4, 'call...': 1, 'First,': 1, 'Gooey.': 2, 'Look,': 2, 'fight': 1, 'This': 1, "Luck's": 1, 'poor': 2, 'now.': 6, 'freeze.': 2, 'game': 4, "Ben's": 5, 'it!': 2, 'Joe': 5, 'their': 2, 'you,': 1, 'Box': 1, 'bands.': 2, 'it': 3, 'bands': 1, 'bricks': 5, "here's": 1, "Let's": 3, 'Sue': 5, 'when': 2, 'clocks,': 2, 'breaks.': 2, 'puddle': 8, 'Socks': 4, 'sir,': 6, 'an': 2, "Bim's": 5, 'Pig': 2, 'now....': 1, 'battle.': 4, 'Slow': 5, 'sew': 2, 'blew.': 1, 'bring': 1, 'game,': 1, 'AND...': 3, 'and': 16, 'brooms.': 1, 'way.': 2, 'booms.': 1, 'lots': 1, 'clock': 1, 'comes.': 4, 'please....': 1, 'then...': 1, '...they': 2, 'say....': 1, 'beetle': 7, 'nose.': 1, 'slow,': 1, 'or': 1, 'Six': 2, 'AND': 1, 'block': 1, 'broom.': 4, 'do': 6, 'it,': 1, 'some.': 2, 'Duck': 1, 'sir?': 2, 'grows.': 1, 'this,': 1, 'Very': 2, 'Big': 2, 'whose': 3, 'noodle-eating': 1, 'chew': 2, 'choose': 2, 'Mr.': 13, 'band': 2, "Here's": 2, 'it.': 2, 'call': 3, 'dumb': 1, 'have': 2, 'so': 2, 'Goo-Goose': 1, 'say.': 2, 'socks.': 5, "trees'": 1, 'poodle': 3, 'socks,': 4, 'my': 1, 'While': 1, 'play.': 2, 'Chicks': 3, 'stack.': 4, 'rose': 2, 'freezy': 1, 'clothes.': 3, 'makes': 1, 'little': 1, 'paddles': 3, 'box': 2, 'all': 1, 'free': 2, 'blocks,': 1, 'Do': 1, 'blab': 1, 'THIS': 1, 'thing': 1, 'bends': 2, 'bent': 2, 'Knox': 8, 'socks?': 2, 'tock.': 2, 'wuddled': 1, 'much': 1, 'takes': 2, 'bends.': 2, 'wait': 1, 'see': 1, 'rubber.': 1, 'of': 4, 'clothes?': 2, 'mouth': 3, 'bottle...': 1, 'too,': 1, 'blibber': 1, 'Try': 2, 'where': 1, "won't": 2, 'get': 1}
Use the a Counter from collections library.
Since the values you want to count are values from your word_dict (i.e. the frequencies of each word). You'll need to initialize the Counter instance like freq = Counter(word_dict.values()). Then you can extract the x and y series for your plot with c.keys() and c.values.
It seems as though you are attempting to plot strings along your x-axis, namely the keys you are using. This is not how pyplot works. You need to plot your values against a numeric vector (typically a numpy array). Once you have done this you can relabel your independent (x) vector using the xticks command.
x = numpy.linspace(0,len(new_dict.keys)-1,len(new_dict.keys))
pyplot.xticks(x, new_dict.keys)
Assuming you mean reversing the key, values, you can do:
>>> di={'bangs': 1, 'sees': 1, 'stuff,': 1, 'Knox....': 1, 'Well': 1, 'about': 2, 'your': 1, 'blocks.': 1, 'what': 4, 'beetles....': 1, 'Boom': 1, 'blue': 1, 'paddled': 1, 'mixed': 1, 'fox': 5, 'Through': 1, 'on': 16, 'trick,': 2, 'When': 4, '...a': 1, 'silly': 1, 'band.': 2, 'come.': 3, "We'll": 2, 'likes': 2, 'slick,': 1, 'comes?': 1, 'chick': 1, 'goo,': 1, "it's": 2, 'then,': 1, 'muddled': 1, 'Now': 3, 'not': 1, 'flew,': 1, 'If,': 1, 'sneeze.': 1, 'bottled': 1, 'paddle': 4, 'called': 1, 'Goo-Goose,': 1, 'Blue': 2, 'Come': 1, 'fox.': 1, 'can': 3, 'poodle,': 1, 'this': 7, "Sue's": 4, 'Ben': 5, 'is': 7, 'goes': 1, 'to': 10, 'Crow': 4, 'cheese': 2, 'quick': 5, 'sir.': 27, 'easy,': 1, 'Clocks': 2, 'Fox': 6, 'Stop': 2, 'up': 1, 'be': 1, 'Well...': 1, 'hose': 2, 'Rose': 1, 'three': 4, 'Freezy': 2, 'New': 3, 'hate': 1, 'broom': 2, 'quite': 1, 'duck': 3, 'we': 1, 'done,': 1, 'tick.': 2, "can't": 5, 'beetles?': 1, 'well,': 2, 'box.': 4, "That's": 4, 'Do,': 1, 'say': 4, 'chicks': 5, '...': 1, 'enough,': 1, 'brick': 1, 'lot': 1, 'You': 4, 'sick': 2, 'that': 1, 'goo.': 4, 'Gooey': 1, 'made': 3, 'new': 5, 'noodles...': 1, 'Knox,': 6, 'for': 2, 'muddle.': 1, 'Bricks': 1, 'Luck': 4, 'Bim': 5, 'minute,': 1, 'brings': 2, 'bottle': 4, 'duddled': 1, "I'll": 3, 'come': 2, 'battles': 1, 'clocks': 2, 'such': 2, 'Then': 1, 'in': 19, 'sir....': 1, 'Two': 1, 'Knox.': 2, "Luke's": 1, 'lakes.': 4, 'trees': 3, "isn't": 2, 'band!': 4, 'our': 1, 'And': 2, 'blubber!': 1, 'another': 1, 'sews': 9, "bottle's": 1, "Crow's": 3, 'Step': 1, 'What': 1, 'grows': 1, 'like': 1, 'ticks': 2, 'too': 1, 'trick': 4, 'Fox,': 3, 'goo': 2, 'chewing!': 1, 'blocks': 3, 'fleas': 3, 'a': 24, 'lakes': 2, "don't": 2, 'those': 1, 'Luke': 4, 'sorry,': 1, 'tocks,': 2, 'Whose': 1, 'you': 3, 'Here': 1, 'tricks': 2, "poodle's": 1, 'they': 3, 'that.': 1, 'doing.': 1, 'Gluey.': 2, 'eating': 1, 'sir!': 1, 'breeze': 2, 'My': 4, 'tweetle': 11, 'these': 5, 'puddle,': 2, 'chewy': 1, 'tongue': 3, 'talk': 1, 'with': 11, 'beetles': 6, 'noodle': 2, 'make': 5, 'who': 1, 'lame,': 1, 'flew.': 1, "I'm": 1, 'Fox!': 2, 'Nose': 1, 'the': 7, 'I': 9, "crow's": 2, 'Thank': 1, 'easy': 2, 'likes.': 2, 'battle': 7, 'licks': 4, 'goes.': 1, 'socks': 4, 'lead': 1, 'muddle': 1, 'shame,': 1, 'Please,': 1, 'fight,': 1, 'fun,': 1, 'chew,': 2, 'fuddled': 1, 'Broom': 1, 'No,': 1, 'Hose': 1, 'something': 2, 'find': 3, 'know': 1, 'Who': 4, 'call...': 1, 'First,': 1, 'Gooey.': 2, 'Look,': 2, 'fight': 1, 'This': 1, "Luck's": 1, 'poor': 2, 'now.': 6, 'freeze.': 2, 'game': 4, "Ben's": 5, 'it!': 2, 'Joe': 5, 'their': 2, 'you,': 1, 'Box': 1, 'bands.': 2, 'it': 3, 'bands': 1, 'bricks': 5, "here's": 1, "Let's": 3, 'Sue': 5, 'when': 2, 'clocks,': 2, 'breaks.': 2, 'puddle': 8, 'Socks': 4, 'sir,': 6, 'an': 2, "Bim's": 5, 'Pig': 2, 'now....': 1, 'battle.': 4, 'Slow': 5, 'sew': 2, 'blew.': 1, 'bring': 1, 'game,': 1, 'AND...': 3, 'and': 16, 'brooms.': 1, 'way.': 2, 'booms.': 1, 'lots': 1, 'clock': 1, 'comes.': 4, 'please....': 1, 'then...': 1, '...they': 2, 'say....': 1, 'beetle': 7, 'nose.': 1, 'slow,': 1, 'or': 1, 'Six': 2, 'AND': 1, 'block': 1, 'broom.': 4, 'do': 6, 'it,': 1, 'some.': 2, 'Duck': 1, 'sir?': 2, 'grows.': 1, 'this,': 1, 'Very': 2, 'Big': 2, 'whose': 3, 'noodle-eating': 1, 'chew': 2, 'choose': 2, 'Mr.': 13, 'band': 2, "Here's": 2, 'it.': 2, 'call': 3, 'dumb': 1, 'have': 2, 'so': 2, 'Goo-Goose': 1, 'say.': 2, 'socks.': 5, "trees'": 1, 'poodle': 3, 'socks,': 4, 'my': 1, 'While': 1, 'play.': 2, 'Chicks': 3, 'stack.': 4, 'rose': 2, 'freezy': 1, 'clothes.': 3, 'makes': 1, 'little': 1, 'paddles': 3, 'box': 2, 'all': 1, 'free': 2, 'blocks,': 1, 'Do': 1, 'blab': 1, 'THIS': 1, 'thing': 1, 'bends': 2, 'bent': 2, 'Knox': 8, 'socks?': 2, 'tock.': 2, 'wuddled': 1, 'much': 1, 'takes': 2, 'bends.': 2, 'wait': 1, 'see': 1, 'rubber.': 1, 'of': 4, 'clothes?': 2, 'mouth': 3, 'bottle...': 1, 'too,': 1, 'blibber': 1, 'Try': 2, 'where': 1, "won't": 2, 'get': 1}
new_di={}
for k, v in di.items():
new_di.setdefault(v, []).append(k)
>>> new_di
{1: ['What', 'game,', 'Whose', 'Thank', 'Broom', 'goo,', 'bring', 'fuddled', 'hate', 'Hose', 'then,', 'sneeze.', 'Here', 'sir....', 'Please,', '...', 'it,', 'get', 'Goo-Goose', 'bands', 'muddle', 'nose.', 'Goo-Goose,', 'sorry,', 'not', "I'm", 'little', 'No,', 'like', 'THIS', 'poodle,', 'Knox....', 'Bricks', 'blibber', 'chick', 'where', 'Rose', 'see', 'noodle-eating', 'call...', 'fun,', 'blue', 'chewing!', 'clock', 'lots', 'slow,', 'sir!', 'chewy', 'goes', 'beetles?', 'Do', 'goes.', 'flew.', 'Box', 'be', 'we', 'eating', 'this,', 'stuff,', "poodle's", 'Duck', 'Well...', 'then...', 'quite', 'minute,', 'Step', 'doing.', 'wait', 'brooms.', 'bottle...', 'thing', 'bangs', 'mixed', 'fight,', 'makes', 'or', 'grows.', 'duddled', 'all', 'too,', 'Two', 'Gooey', 'Boom', 'another', 'If,', 'done,', 'your', '...a', 'First,', 'now....', 'fight', 'muddle.', "trees'", 'too', 'lot', 'enough,', 'blew.', 'brick', 'This', 'Come', 'easy,', 'that', 'Well', "Luke's", 'those', "here's", 'say....', 'up', 'you,', 'freezy', 'silly', 'flew,', 'wuddled', 'dumb', 'my', 'called', 'lame,', 'sees', 'Do,', 'comes?', "Luck's", 'blubber!', 'rubber.', 'shame,', 'paddled', 'Then', 'blab', 'battles', 'booms.', 'bottled', 'please....', 'Through', 'grows', 'muddled', 'that.', 'our', 'who', 'much', 'slick,', 'Nose', 'blocks,', "bottle's", 'While', 'beetles....', 'noodles...', 'lead', 'fox.', 'AND', 'blocks.', 'block', 'talk', 'know'], 2: ['Blue', "don't", 'choose', 'clocks', 'band.', 'tock.', 'Big', 'broom', 'some.', "crow's", 'easy', 'it.', 'it!', 'Try', 'tocks,', 'Pig', 'Clocks', "isn't", 'likes', 'sew', 'chew', 'bends', 'Very', 'box', 'puddle,', 'Knox.', 'band', 'Six', 'for', 'ticks', '...they', "Here's", 'hose', 'And', 'free', 'say.', 'come', 'about', 'chew,', 'likes.', 'Freezy', 'way.', 'tick.', 'rose', 'cheese', 'bent', 'takes', 'their', "it's", "We'll", 'Fox!', 'brings', 'noodle', 'clocks,', 'Gooey.', 'Gluey.', 'sir?', 'when', 'breaks.', 'have', 'an', 'well,', 'something', 'clothes?', 'bends.', 'Stop', 'trick,', 'sick', 'poor', "won't", 'bands.', 'goo', 'play.', 'socks?', 'such', 'tricks', 'freeze.', 'breeze', 'so', 'lakes', 'Look,'], 3: ['find', 'Now', 'mouth', 'trees', 'they', 'Chicks', 'fleas', 'New', 'come.', 'whose', 'AND...', 'tongue', 'poodle', 'duck', 'call', 'Fox,', "I'll", 'made', 'can', 'paddles', 'it', 'clothes.', "Let's", 'you', 'blocks', "Crow's"], 4: ['goo.', 'band!', 'game', 'socks', 'battle.', 'My', 'lakes.', 'broom.', 'what', 'paddle', "Sue's", 'of', 'When', 'Socks', 'three', 'box.', 'licks', "That's", 'trick', 'socks,', 'say', 'comes.', 'You', 'stack.', 'Luke', 'Who', 'Luck', 'Crow', 'bottle'], 5: ['chicks', 'Bim', 'quick', 'Sue', 'fox', 'Joe', 'new', "Bim's", "can't", 'bricks', 'socks.', "Ben's", 'Ben', 'Slow', 'make', 'these'], 6: ['Fox', 'Knox,', 'do', 'now.', 'sir,', 'beetles'], 7: ['beetle', 'battle', 'this', 'is', 'the'], 8: ['Knox', 'puddle'], 9: ['sews', 'I'], 10: ['to'], 11: ['tweetle', 'with'], 13: ['Mr.'], 16: ['and', 'on'], 19: ['in'], 24: ['a'], 27: ['sir.']}
I'm not sure what you used for tokenizing your data, but a quick solution could be using nltk.
Here is a small example on how it can be done:
# necessary imports
from nltk import FreqDist # used later to plot and get count
from nltk.tokenize import word_tokenize # tokenizes our sentence by word
# sample text
text = 'this is a super long text, that has some random words in it. It is not really
that long, but could be very long.'
tknz = word_tokenize(text) # tokenizes the text into ('this', 'is',...)
fdist = FreqDist(tknz) # creates frequency distribution from the tokenized words
From that you can simply do fdis.plot() which gives:
From here you have a matplotlib plot that you can edit, and it only took a few lines to obtain.
You can find additional information about FreqDist here. It also behaves like a dictionary:
>>> fdist.items()
dict_items([(',', 2), ('in', 1), ('a', 1), ('very', 1), ('really', 1), ('be', 1), ...])
gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html': {'dail': 1, 'focus': 1, 'actions': 1, 'trade': 2, 'protest': 1, 'identify': 1, 'previous': 1, 'detectives': 1, 'republican': 1, 'group': 1, 'monitor': 1, 'clashes': 1, 'civil': 1, 'charge': 1, 'breaches': 1, 'travelling': 1, 'main': 1, 'disrupt': 1, 'real': 1, 'policing': 3, 'march': 6, 'finance': 1, 'drawn': 1, 'assistant': 1, 'protesters': 1, 'emphasised': 1, 'department': 1, 'traffic': 2, 'outbreak': 1, 'culprits': 1, 'proportionate': 1, 'instructions': 1, 'warned': 2, 'commanders': 1, 'michael': 2, 'exploit': 1, 'culminating': 1, 'large': 2, 'continue': 1, 'team': 1, 'hijack': 1, 'disorder': 1, 'square': 1, 'leaders': 1, 'deal': 2, 'people': 3, 'streets': 1, 'demonstrations': 2, 'observed': 1, 'street': 2, 'college': 1, 'organised': 1, 'operation': 1, 'special': 1, 'shown': 1, 'attendance': 1, 'normal': 1, 'unions': 2, 'individuals': 1, 'safety': 2, 'prosecuted': 1, 'ira': 1, 'ground': 1, 'public': 2, 'told': 1, 'body': 1, 'stewards': 2, 'obey': 1, 'business': 1, 'gathered': 1, 'assemble': 1, 'garda': 5, 'sinn': 1, 'broken': 1, 'fachtna': 1, 'management': 2, 'possibility': 1, 'groups': 3, 'put': 1, 'affiliated': 1, 'strong': 2, 'security': 1, 'stage': 1, 'behaviour': 1, 'involved': 1, 'route': 2, 'violence': 1, 'dublin': 3, 'fein': 1, 'ensure': 2, 'stand': 1, 'act': 2, 'contingency': 1, 'troublemakers': 2, 'facilitate': 2, 'road': 1, 'members': 1, 'prepared': 1, 'presence': 1, 'sullivan': 2, 'reassure': 1, 'number': 3, 'community': 1, 'strategic': 1, 'visible': 2, 'addressed': 1, 'notify': 1, 'trained': 1, 'eirigi': 1, 'city': 4, 'gpo': 1, 'from': 3, 'crowd': 1, 'visit': 1, 'wood': 1, 'editor': 1, 'peaceful': 4, 'expected': 2, 'today': 1, 'commissioner': 4, 'quay': 1, 'ictu': 1, 'advance': 1, 'murphy': 2, 'gardai': 6, 'aware': 1, 'closures': 1, 'courts': 1, 'branch': 1, 'deployed': 1, 'made': 1, 'thousands': 1, 'socialist': 1, 'work': 1, 'supt': 2, 'feehan': 1, 'mr': 1, 'briefing': 1, 'visited': 1, 'manner': 1, 'irish': 2, 'metropolitan': 1, 'spotters': 1, 'organisers': 1, 'in': 13, 'dissident': 1, 'evidence': 1, 'tom': 1, 'arrangements': 3, 'experience': 1, 'allowed': 1, 'sought': 1, 'rally': 1, 'connell': 1, 'officers': 3, 'potential': 1, 'holding': 1, 'units': 1, 'place': 2, 'events': 1, 'dignified': 1, 'planned': 1, 'independent': 1, 'added': 2, 'plans': 1, 'congress': 1, 'centre': 3, 'comprehensive': 1, 'measures': 1, 'yesterday': 2, 'alert': 1, 'important': 1, 'moving': 1, 'plan': 2, 'highly': 1, 'law': 2, 'senior': 2, 'fair': 1, 'recent': 1, 'refuse': 1, 'attempt': 1, 'brady': 1, 'liaising': 1, 'conscious': 1, 'light': 1, 'clear': 1, 'headquarters': 1, 'wing': 1, 'chief': 2, 'maintain': 1, 'harcourt': 1, 'order': 2, 'left': 1}}
I have a python script that extracts words from text files and counts the number of times they occur in the file.
I want to add them to an ".ARFF" file to use for weka classification.
Above is an example output of my python script.
How do I go about inserting them into an ARFF file, keeping each text file separate. Each file is differentiated by {"with their words in here!!"}
I know it's pretty easy to generate an arff file on your own, but I still wanted to make it simpler so I wrote a python package
https://github.com/ubershmekel/arff
It's also on pypi so easy_install arff
There are details on the ARFF file format here and it's very simple to generate. For example, using a cut-down version of your Python dictionary, the following script:
import re
d = { 'gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html':
{'dail': 1,
'focus': 1,
'actions': 1,
'trade': 2,
'protest': 1,
'identify': 1 }}
for original_filename in d.keys():
m = re.search('^(.*)\.html$',original_filename,)
if not m:
print "Ignoring the file:", original_filename
continue
output_filename = m.group(1)+'.arff'
with open(output_filename,"w") as fp:
fp.write('''#RELATION wordcounts
#ATTRIBUTE word string
#ATTRIBUTE count numeric
#DATA
''')
for word_and_count in d[original_filename].items():
fp.write("%s,%d\n" % word_and_count)
Generates output of the form:
#RELATION wordcounts
#ATTRIBUTE word string
#ATTRIBUTE count numeric
#DATA
dail,1
focus,1
actions,1
trade,2
protest,1
identify,1
... in a file called gardai-plan-crackdown-on-troublemakers-at-protest-2438316.arff. If that's not exactly what you want, I'm sure you can easily alter it. (For example, if the "words" might have spaces or other punctuation in them, you probably want to quote them.)
This project seems to be a bit more up to date. You can install it via
pip:
$ pip install liac-arff
or easy_install:
$ easy_install liac-arff