Dynamically create string from pandas column - python

I have two data frame like below one is df and another one is anomalies:-
d = {'10028': [0], '1058': [25], '20120': [29], '20121': [22],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
df = pd.DataFrame(data=d)
Basically anomalies in a mirror copy of df just in anomalies the value will be 0 or 1 which indicates anomalies where value is 1 and non-anomaly where value is 0
d = {'10028': [0], '1058': [1], '20120': [1], '20121': [0],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
anomalies = pd.DataFrame(data=d)
and I am converting that into a specific format with the below code:-
details = (
'\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' +
'\n' + '10028:' + '\t' + str(df.tail(1)['10028'][0]) + '\t' + str(anomalies['10028'][0]) +
'\n' + '1058:' + '\t' + '\t' + str(df.tail(1)['1058'][0]) + '\t' + str(anomalies['1058'][0]) +
'\n' + '20120:' + '\t' + str(df.tail(1)['20120'][0]) + '\t' + str(anomalies['20120'][0]) +
'\n' + '20121:' + '\t' + str(round(df.tail(1)['20121'][0], 2)) + '\t' + str(anomalies['20121'][0]) +
'\n' + '20122:' + '\t' + str(round(df.tail(1)['20122'][0], 2)) + '\t' + str(anomalies['20122'][0]) +
'\n' + '20123:' + '\t' + str(round(df.tail(1)['20123'][0], 3)) + '\t' + str(anomalies['20123'][0]) +
'\n' + '5043:' + '\t' + str(round(df.tail(1)['5043'][0], 3)) + '\t' + str(anomalies['5043'][0]) +
'\n' + '5046:' + '\t' + str(round(df.tail(1)['5046'][0], 3)) + '\t' + str(anomalies['5046'][0]) +
'\n\n' + 'message:' + '\t' +
'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
)
The problem is the column values are changing always in every run I mean like in this run its '10028', '1058', '20120', '20121', '20122', '20123', '5043', '5046' but maybe in next run it will be '10029', '1038', '20121', '20122', '20123', '5083', '5946'
How I can create the details dynamically depending on what columns are present in the data frame as I don't want to hard code and in the message i want to pass the name of columns whose value is 1.
The value of columns will always be either 1 or 0.

Try this:
# first part of the string
s = '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly'
# dynamically add the data
for idx, val in df.iloc[-1].iteritems():
s += f'\n{idx}\t{val}\t{anomalies[idx][0]}'
# for Python 3.5 and below, use this
# s += '\n{}\t{}\t{}'.format(idx, val, anomalies[idx][0])
# last part
s += ('\n\n' + 'message:' + '\t' +
'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
)

Related

Writing measurements to a file only writes the first 10 lines

I'm trying to write certain measurements to an output file in python3, but the output file only reflects the first 10 lines
I'm using the following code to write to the file:
f = open("measurements.txt", "w")
for infile in glob.glob("./WAVs/*"):
#do some stuff with the input file
f.write(outfile.removesuffix(".wav") + "\t" + str(oldSIL) + "\t" +
str(oldSPL) + "\t" + str(oldLoud) + "\t" + str(newLoud) + "\t"+
infile.removesuffix(".wav") + "\n")
f.close()
Looking at measurements.txt I find that only the first 10 lines of the expected output have been written.
If I try to print the same lines instead of writing to a file using
print(outfile.removesuffix(".wav") + "\t" + str(oldSIL) + "\t" + str("oldSPL") + "\t" + str(oldLoud) + "\t" + str(newLoud) + "\t"+ infile.removesuffix(".wav") + "\n")
It correctly prints every single line up to the final index. I'm a little lost as to why this might be the case.

pandas iterate over column values at once and generate range

I have a pandas dataframe like as below
df1 = pd.DataFrame({'biz': [18, 23], 'seg': [30, 34], 'PID': [40, 52]})
I would like to do the below
a) pass all the values from each column at once to for loop
For ex:
I am trying the below
cols = ['biz','seg','PID']
for col in cols:
for i, j in df1.col.values:
print("D" + str(i) + ":" + "F" + str(j))
print("Q" + str(i) + ":" + "S" + str(j))
print("AB" + str(i) + ":" + "AD" + str(j))
but this doesn;t work and I get an error
TypeError: cannot unpack non-iterable numpy.int64 object
I expect my output to be like as below
D18:F23
Q18:S23
AB18:AD23
D30:F34
Q30:S34
AB30:AD34
D40:F52
Q40:S52
AB40:AD52
The mistake is in the innermost forloop.
You are requesting an iterator over a 1-dimensional array of values, this iterator yields scalar values and hence they can not be unpacked.
If your dataframe only has 2 items per column, then this should suffice
cols = ['biz','seg','PID']
for col in cols:
i, j = getattr(df1, col).values
print("D" + str(i) + ":" + "F" + str(j))
print("Q" + str(i) + ":" + "S" + str(j))
print("AB" + str(i) + ":" + "AD" + str(j))
Alternatives
Pandas using loc
This is actually the simplest way to solve it but only now it occurred to me. We use the column name col along with loc to get all rows (given by : in loc[:, col])
cols = ['biz','seg','PID']
for col in cols:
i, j = df1.loc[:, col].values
Attrgetter
We can use the attrgetter object from operator library to get a single (or as many attributes) as we want:
from operator import attrgetter
cols = ['biz','seg','PID']
cols = attrgetter(*cols)(df1)
for col in cols:
i, j = col.values
Attrgetter 2
This approach is similar to the one above, except that we select multiple columns and have the i and j in two lists, with each entry corresponding to one column.
from operator import attrgetter
cols = ['biz','seg','PID']
cols = attrgetter(*cols)(df1)
cols = [col.values for col in cols]
all_i, all_j = zip(*cols)
Pandas solution
This approach uses just pandas functions. It gets the column index using the df1.columns.get_loc(col_name) function, and then uses .iloc to index the values. In .iloc[a,b] we use : in place of a to select all rows, and index in place of b to select just the column.
cols = ['biz','seg','PID']
for col in cols:
index = df1.columns.get_loc(col)
i, j = df1.iloc[:, index]
# do the printing here
for i in range(len(df1)):
print('D' + str(df1.iloc[i,0]) + ':' + 'F' + str(df1.iloc[i+1,0]))
print('Q' + str(df1.iloc[i,0]) + ':' + 'S' + str(df1.iloc[i+1,0]))
print('AB' + str(df1.iloc[i,0]) + ':' + 'AD' + str(df1.iloc[i+1,0]))
print('D' + str(df1.iloc[i,1]) + ':' + 'F' + str(df1.iloc[i+1,1]))
print('Q' + str(df1.iloc[i,1]) + ':' + 'S' + str(df1.iloc[i+1,1]))
print('AB' + str(df1.iloc[i,1]) + ':' + 'AD' + str(df1.iloc[i+1,1]))
print('D' + str(df1.iloc[i,2]) + ':' + 'F' + str(df1.iloc[i+1,2]))
print('Q' + str(df1.iloc[i,2]) + ':' + 'S' + str(df1.iloc[i+1,2]))
print('AB' + str(df1.iloc[i,2]) + ':' + 'AD' + str(df1.iloc[i+1,2]))

I'm trying to create a series of random numbers in Python

I have a script here that I created to generate a string of random numbers. I do not know how to make it where the numbers would be random every time it is used. For instance I want it to be like this:
0,1,2,3,5
1,4,7,3,5
2,8,2,1,3
Instead I am getting:
0,1,2,3,5
1,1,2,3,5
2,1,2,3,5
The first number is running in sequence which is great. But I need the other 4 numbers to change.
inter1rand = random.randint(1, 1)
inter1time = inter1rand
inter1 = inter1time
print (inter1)
inter2rand = random.randint(24, 31)
inter2time = inter2rand + inter1time
inter2 = inter2time
print (inter2)
inter3rand = random.randint(34, 55)
inter3time = inter3rand + inter2time
inter3 = inter3time
print (inter3)
mouse1 = random.randint(110, 1100)
mouse2 = random.randint(875, 1100)
mouse3 = random.randint(375, 607)
col_list1 = [str(n) + ',1,' + str(mouse1) + ',' + str(mouse2) + ',' + str(mouse3) + ';' for n in range(0,inter1)]
col_list2 = [str(n) + ',1,' + str(mouse1) + ',' + str(mouse2) + ',' + str(mouse3) + ';' for n in range(inter1,inter2)]
col_list3 = [str(n) + ',1,' + str(mouse1) + ',' + str(mouse2) + ',' + str(mouse3) + ';' for n in range(inter2,inter3)]
#print (col_list1)
#print (col_list1 + col_list2)
#print (col_list1 + col_list2 + col_list3)
Change the code from using mouse1, mouse2 and mouse3 to directly use random.randint:
col_list1 = [str(n) + ',1,' + str(random.randint(110, 1100)) + ',' + str(random.randint(875, 1100)) + ',' + str(random.randint(375, 607)) + ';' for n in range(0,inter1)]
col_list2 = [str(n) + ',1,' + str(random.randint(110, 1100)) + ',' + str(random.randint(875, 1100)) + ',' + str(random.randint(375, 607)) + ';' for n in range(inter1,inter2)]
col_list3 = [str(n) + ',1,' + str(random.randint(110, 1100)) + ',' + str(random.randint(875, 1100)) + ',' + str(random.randint(375, 607)) + ';' for n in range(inter2,inter3)]
If you want truly random integer values you can check out np.random.choice() function also.
np.random.choice(10,3)
array([9, 4, 2])
>>> np.random.choice(10,3)
array([6, 7, 8])
>>> np.random.choice(31,3)
array([11, 28, 10])
>>> np.random.choice([0,1],3)
array([0, 1, 1])
In this function you might not get to give range but chances of repetition is less.
Use the random module, and its randint:
import random
for u in range(3):
r = [random.randint(0, 10) for i in range(5)]
for d in r[:-1]:
print(d, end=', ')
print(r[-1])
Returns:
9, 6, 0, 4, 1
7, 9, 2, 2, 1
4, 10, 6, 5, 8
There are a lot of ways to do this, I've linked some of the ways sources below:
https://www.tutorialspoint.com/generating-random-number-list-in-python
https://docs.python.org/3/library/secrets.html
https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html

Python won't convert tuple to string

so the below code is supposed to take the first element in the resulting tuple of x and convert it to a string to be used. However, when executing the last line it tells me it can't convert from tuple to str.
for x in filelink:
print(x[0])
item = str(x[0])
oldpath = root.wgetdir + "\\" + root.website.get() + "\\" + item
print(oldpath)
if os.path.exists(oldpath): shutil.copy(root.wgetdir + "\\" + root.website.get() + "\\" + x, keyworddir + "\\" + item)
This part:
root.wgetdir + "\\" + root.website.get() + "\\" + x
right here ^
is using the tuple instead of item.

Organize dictionary by frequency

I create a dictionary for the most used words and get the top ten. I need to sort this for the list, which should be in order. I can't do that without making a list, which I can't use. Here is my code. I am away dictionaries cannot be sorted, but i still need help.
most_used_words = Counter()
zewDict = Counter(most_used_words).most_common(10)
newDict = dict(zewDict)
keys = newDict.keys()
values = newDict.values()
msg = ('Here is your breakdown of your most used words: \n\n'
'Word | Times Used'
'\n:--:|:--:'
'\n' + str(keys[0]).capitalize() + '|' + str(values[0]) +
'\n' + str(keys[1]).capitalize() + '|' + str(values[1]) +
'\n' + str(keys[2]).capitalize() + '|' + str(values[2]) +
'\n' + str(keys[3]).capitalize() + '|' + str(values[3]) +
'\n' + str(keys[4]).capitalize() + '|' + str(values[4]) +
'\n' + str(keys[5]).capitalize() + '|' + str(values[5]) +
'\n' + str(keys[6]).capitalize() + '|' + str(values[6]) +
'\n' + str(keys[7]).capitalize() + '|' + str(values[7]) +
'\n' + str(keys[8]).capitalize() + '|' + str(values[8]) +
'\n' + str(keys[9]).capitalize() + '|' + str(values[9]))
r.send_message(user, 'Most Used Words', msg)
How would I do it so the msg prints the words in order from most used word on the top to least on the bottom with the correct values for the word?
Edit: I know dictionaries cannot be sorted on their own, so can I work around this somehow?
Once you have the values it's as simple as:
print('Word | Times Used')
for e, t in collections.Counter(values).most_common(10):
print("%s|%d" % (e,t))
Print something like:
Word | Times Used
e|4
d|3
a|2
c|2
From the Docs: most_common([n])
Return a list of the n most common elements and their counts from the
most common to the least. If n is not specified, most_common() returns
all elements in the counter. Elements with equal counts are ordered
arbitrarily:
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
Your code can be:
from collections import Counter
c = Counter(most_used_words)
msg = "Here is your breakdown of your most used words:\n\nWords | Times Used\n:--:|:--:\n"
msg += '\n'.join('%s|%s' % (k.capitalize(), v) for (k, v) in c.most_common(10))
r.send_message(user, 'Most Used Words', msg)
import operator
newDict = dict(zewDict)
sorted_newDict = sorted(newDict.iteritems(), key=operator.itemgetter(1))
msg = ''
for key, value in sorted_newDict:
msg.append('\n' + str(key).capitalize() + '|' + str(value))
This will sort by the dictionary values. If you want it in the other order add reverse=True to sorted().

Categories