Organize dictionary by frequency - python

I create a dictionary for the most used words and get the top ten. I need to sort this for the list, which should be in order. I can't do that without making a list, which I can't use. Here is my code. I am away dictionaries cannot be sorted, but i still need help.
most_used_words = Counter()
zewDict = Counter(most_used_words).most_common(10)
newDict = dict(zewDict)
keys = newDict.keys()
values = newDict.values()
msg = ('Here is your breakdown of your most used words: \n\n'
'Word | Times Used'
'\n:--:|:--:'
'\n' + str(keys[0]).capitalize() + '|' + str(values[0]) +
'\n' + str(keys[1]).capitalize() + '|' + str(values[1]) +
'\n' + str(keys[2]).capitalize() + '|' + str(values[2]) +
'\n' + str(keys[3]).capitalize() + '|' + str(values[3]) +
'\n' + str(keys[4]).capitalize() + '|' + str(values[4]) +
'\n' + str(keys[5]).capitalize() + '|' + str(values[5]) +
'\n' + str(keys[6]).capitalize() + '|' + str(values[6]) +
'\n' + str(keys[7]).capitalize() + '|' + str(values[7]) +
'\n' + str(keys[8]).capitalize() + '|' + str(values[8]) +
'\n' + str(keys[9]).capitalize() + '|' + str(values[9]))
r.send_message(user, 'Most Used Words', msg)
How would I do it so the msg prints the words in order from most used word on the top to least on the bottom with the correct values for the word?
Edit: I know dictionaries cannot be sorted on their own, so can I work around this somehow?

Once you have the values it's as simple as:
print('Word | Times Used')
for e, t in collections.Counter(values).most_common(10):
print("%s|%d" % (e,t))
Print something like:
Word | Times Used
e|4
d|3
a|2
c|2

From the Docs: most_common([n])
Return a list of the n most common elements and their counts from the
most common to the least. If n is not specified, most_common() returns
all elements in the counter. Elements with equal counts are ordered
arbitrarily:
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
Your code can be:
from collections import Counter
c = Counter(most_used_words)
msg = "Here is your breakdown of your most used words:\n\nWords | Times Used\n:--:|:--:\n"
msg += '\n'.join('%s|%s' % (k.capitalize(), v) for (k, v) in c.most_common(10))
r.send_message(user, 'Most Used Words', msg)

import operator
newDict = dict(zewDict)
sorted_newDict = sorted(newDict.iteritems(), key=operator.itemgetter(1))
msg = ''
for key, value in sorted_newDict:
msg.append('\n' + str(key).capitalize() + '|' + str(value))
This will sort by the dictionary values. If you want it in the other order add reverse=True to sorted().

Related

Dynamically create string from pandas column

I have two data frame like below one is df and another one is anomalies:-
d = {'10028': [0], '1058': [25], '20120': [29], '20121': [22],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
df = pd.DataFrame(data=d)
Basically anomalies in a mirror copy of df just in anomalies the value will be 0 or 1 which indicates anomalies where value is 1 and non-anomaly where value is 0
d = {'10028': [0], '1058': [1], '20120': [1], '20121': [0],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
anomalies = pd.DataFrame(data=d)
and I am converting that into a specific format with the below code:-
details = (
'\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' +
'\n' + '10028:' + '\t' + str(df.tail(1)['10028'][0]) + '\t' + str(anomalies['10028'][0]) +
'\n' + '1058:' + '\t' + '\t' + str(df.tail(1)['1058'][0]) + '\t' + str(anomalies['1058'][0]) +
'\n' + '20120:' + '\t' + str(df.tail(1)['20120'][0]) + '\t' + str(anomalies['20120'][0]) +
'\n' + '20121:' + '\t' + str(round(df.tail(1)['20121'][0], 2)) + '\t' + str(anomalies['20121'][0]) +
'\n' + '20122:' + '\t' + str(round(df.tail(1)['20122'][0], 2)) + '\t' + str(anomalies['20122'][0]) +
'\n' + '20123:' + '\t' + str(round(df.tail(1)['20123'][0], 3)) + '\t' + str(anomalies['20123'][0]) +
'\n' + '5043:' + '\t' + str(round(df.tail(1)['5043'][0], 3)) + '\t' + str(anomalies['5043'][0]) +
'\n' + '5046:' + '\t' + str(round(df.tail(1)['5046'][0], 3)) + '\t' + str(anomalies['5046'][0]) +
'\n\n' + 'message:' + '\t' +
'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
)
The problem is the column values are changing always in every run I mean like in this run its '10028', '1058', '20120', '20121', '20122', '20123', '5043', '5046' but maybe in next run it will be '10029', '1038', '20121', '20122', '20123', '5083', '5946'
How I can create the details dynamically depending on what columns are present in the data frame as I don't want to hard code and in the message i want to pass the name of columns whose value is 1.
The value of columns will always be either 1 or 0.
Try this:
# first part of the string
s = '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly'
# dynamically add the data
for idx, val in df.iloc[-1].iteritems():
s += f'\n{idx}\t{val}\t{anomalies[idx][0]}'
# for Python 3.5 and below, use this
# s += '\n{}\t{}\t{}'.format(idx, val, anomalies[idx][0])
# last part
s += ('\n\n' + 'message:' + '\t' +
'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
)

SymPy - Treating numbers as symbols

How can I treat numbers as symbols in SymPy?
For example, if I am performing a factorization with symbols I get:
from sympy import factor
factor('a*c*d + a*c*e + a*c*f + b*c*d + b*c*e + b*c*f')
c*(a + b)*(d + e + f)
I would like the same behaviour when I am using numbers in the expression.
Instead of
from sympy import factor
factor('2006*c*d + 2006*c*e + 2006*c*f + 2007*c*d + 2007*c*e + 2007*c*f')
4013*c*(d + e + f)
I would like to get
from sympy import factor
factor('2006*c*d + 2006*c*e + 2006*c*f + 2007*c*d + 2007*c*e + 2007*c*f')
c*(2006 + 2007)*(d + e + f)
Replace each constant with a unique symbol.
Factor the resulting expression.
Replace the unique symbols with the constants.
For your given case, something like this:
simple = factor('const2006*c*d + const2006*c*e + const2006*c*f + const2007*c*d + const2007*c*e + const2007*c*f')
simple.replace("const", '')
print(simple)
This should give you the desired output. You can identify numeric tokens in the expression with a straightforward regex or trivial parser -- either of which is covered in many other locations.
Symbol trickery to the rescue: replace your numbers with Symbols having a name given by the number. In your case you don't have to watch for negative versions so the following is straightforward:
>>> s = '2006*c*d + 2006*c*e + 2006*c*f + 2007*c*d + 2007*c*e + 2007*c*f'
>>> eq = S(s, evaluate=False); eq
2006*c*d + 2007*c*d + 2006*c*e + 2007*c*e + 2006*c*f + 2007*c*f
>>> reps = dict([(i,Symbol(str(i))) for i in _.atoms(Integer)]); reps
{2006: 2006, 2007: 2007}
>>> factor(eq.subs(reps))
c*(2006 + 2007)*(d + e + f)
Note: the evaluate=False is used to keep the like-terms from combining to give 4013*c*d + 4013*c*e + 4013*c*f.

Returning original string with symbols between each character

I'm trying to make my program return the exact same string but with ** between each character. Here's my code.
def separate(st):
total = " "
n = len(st + st[-1])
for i in range(n):
total = str(total) + str(i) + str("**")
return total
x = separate("12abc3")
print(x)
This should return:
1**2**a**b**c**3**
However, I'm getting 0**1**2**3**4**5**6**.
You can join the characters in the string together with "**" as the separator (this works because strings are basically lists in Python). To get the additional "**" at the end, just concatenate.
Here's an example:
def separate(st):
return "**".join(st) + "**"
Sample:
x = separate("12abc3")
print(x) # "1**2**a**b**c**3**"
A note on your posted code:
The reason you get the output you do is because you loop using for i in range(n): so the iteration variable i will be each index in st. Then when you call str(total) + str(i) + str("**"), you cast i to a string, and i was just each index (from 0 to n-1) in st.
To fix that you could iterate over the characters in st directly, like this:
for c in st:
or use the index i to get the character at each position in st, like this:
for i in range(len(st)):
total = total + st[i] + "**"
welcome to StackOverflow!
I will explain part of your code line by line.
for i in range(n) since you are only providing 1 parameter (which is for the stopping point), this will loop starting from n = 0, 1, 2, ... , n-1
total = str(total) + str(i) + str("**") this add i (which is the current number of iteration - 1) and ** to the current total string. Hence, which it is adding those numbers sequentially to the result.
What you should do instead is total = str(total) + st[i] + str("**") so that it will add each character of st one by one
In addition, you could initialize n as n = len(st)

Append the string in python for forloop

I need to print and append the list of all value in the list of python, which I not able to get it. Can you one help me on this. I am just comparing one system data with another system data and identified the mismatches on each field.
Example, Table as kid_id, name, age, gender, address in 2 different systems. I need to ensure all kids data are correctly moved from 1data to 2data systems.
Emp_id like 1,2,3,4,5,6
2_data = self.get2Data(kid_id)
1_data = self.get1Data(kid_id)
for i in range(len(1data)):
for key, value in 1data[i].items():
if 1data[i][key] == 2data[i][key]:
result = str("LKG") + ","+ str(kid_id) +","+ str("PASS") + "," + str(key)
else:
result = str("LKG") + "," + str(kid_id) + "," + str("FAIL") + "," + str(key)
MatchResult = result.split()
print MatchResult
print "***It is Done*****"
Currently my Output is like,
['LKG,100,PASS,address']
['LKG,102,FAIL,dob']
['LKG,105,FAIL,gender']
but i need in the way of,
(['LKG,100,PASS,address'],['LKG,102,FAIL,dob'],['LKG,105,FAIL,gender'])
or
[('LKG,100,PASS,address'),('LKG,102,FAIL,dob'),('LKG,105,FAIL,gender')]
Code details: The above code will compare the two system data and show the pass and fail cases by printing the above format. If you see the above result, it is print address as pass and dob as fail and gender as fail that means still data mismatch is their for dob and gender field for the kid holding 102 and 105.
Move list variable declaration before loop and initialize it to empty list and then append the results each time.
2_data = self.get2Data(kid_id)
1_data = self.get1Data(kid_id)
MatchResult=[]
for i in range(len(1data)):
for key, value in 1data[i].items():
if 1data[i][key] == 2data[i][key]:
result = str("LKG") + ","+ str(kid_id) +","+ str("PASS") + "," + str(key)
else:
result = str("LKG") + "," + str(kid_id) + "," + str("FAIL") + "," + str(key)
MatchResult.append( result.split())
print MatchResult
print "***It is Done*****"

Formatting output when writing a list to textfile

i have a list of lists that looks like this:
dupe = [['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt'], ['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt'], ['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'apa2.txt'], ['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'jude.txt']]
I write it to a file using a very basic function():
try:
file_name = open("dupe.txt", "w")
except IOError:
pass
for a in range (len(dupe)):
file_name.write(dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2] + "\n");
file_name.close()
With the output in the file looking like this:
95d1543adea47e88923c3d4ad56e9f65c2b40c76 ron\c apa.txt
95d1543adea47e88923c3d4ad56e9f65c2b40c76 ron\c knark.txt
b5cc17d3a35877ca8b76f0b2e07497039c250696 ron\a apa2.txt
b5cc17d3a35877ca8b76f0b2e07497039c250696 ron\a jude.txt
However, how can i make the output in the dupe.txt file to look like this:
95d1543adea47e88923c3d4ad56e9f65c2b40c76 ron\c apa.txt, knark.txt
b5cc17d3a35877ca8b76f0b2e07497039c250696 ron\a apa2.txt, jude.txt
First, group the lines by the "key" (the first two elements of each array):
dupedict = {}
for a, b, c in dupe:
dupedict.setdefault((a,b),[]).append(c)
Then print it out:
for key, values in dupedict.iteritems():
print ' '.join(key), ', '.join(values)
i take it your last question didn't solve your problem?
instead of putting each list with repeating ID's and directories in seperate lists, why not make the file element of the list another sub list which contains all the files which have the same id and directory.
so dupe would look like this:
dupe = [['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', ['apa.txt','knark.txt']],
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', ['apa2.txt','jude.txt']]
then your print loop could be similar to:
for i in dupe:
print i[0], i[1],
for j in i[2]
print j,
print
from collections import defaultdict
dupe = [
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt'],
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt'],
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'apa2.txt'],
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'jude.txt'],
]
with open("dupe.txt", "w") as f:
data = defaultdict(list)
for hash, dir, fn in dupe:
data[(hash, dir)].append(fn)
for hash_dir, fns in data.items():
f.write("{0[0]} {0[1]} {1}\n".format(hash_dir, ', '.join(fns)))
Use a dict to group them:
data = [['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt'], \
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt'], \
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'apa2.txt'], \
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'jude.txt']]
dupes = {}
for row in data:
if dupes.has_key(row[0]):
dupes[row[0]].append(row)
else:
dupes[row[0]] = [row]
for dupe in dupes.itervalues():
print "%s\t%s\t%s" % (dupe[0][0], dupe[0][1], ",".join([x[2] for x in dupe]))
If this is your actual answer, you can:
Output one line per every two elements in dupe. This is easier. Or,
If your data isn't as structured (so you may you can make a dictionary where your long hash is the key, and the tail end of the string is your output. Make sense?
In idea one, mean that you can something like this:
tmp_string = ""
for a in range (len(dupe)):
if isOdd(a):
tmp_string = dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2]
else:
tmp_string += ", " + dupe[a][2]
file_name.write(dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2] + "\n");
In idea two, you may have something like this:
x=dict()
for a in range(len(dupe)):
# check if the hash exists in x; bad syntax - I dunno "exists?" syntax
if (exists(x[dupe[a][0]])):
x[a] += "," + dupe[a][2]
else:
x[a] = dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2]
for b in x: # bad syntax: basically, for every key in dictionary x
file_name.write(x[b]);

Categories