I have a pandas dataframe like as below
df1 = pd.DataFrame({'biz': [18, 23], 'seg': [30, 34], 'PID': [40, 52]})
I would like to do the below
a) pass all the values from each column at once to for loop
For ex:
I am trying the below
cols = ['biz','seg','PID']
for col in cols:
for i, j in df1.col.values:
print("D" + str(i) + ":" + "F" + str(j))
print("Q" + str(i) + ":" + "S" + str(j))
print("AB" + str(i) + ":" + "AD" + str(j))
but this doesn;t work and I get an error
TypeError: cannot unpack non-iterable numpy.int64 object
I expect my output to be like as below
D18:F23
Q18:S23
AB18:AD23
D30:F34
Q30:S34
AB30:AD34
D40:F52
Q40:S52
AB40:AD52
The mistake is in the innermost forloop.
You are requesting an iterator over a 1-dimensional array of values, this iterator yields scalar values and hence they can not be unpacked.
If your dataframe only has 2 items per column, then this should suffice
cols = ['biz','seg','PID']
for col in cols:
i, j = getattr(df1, col).values
print("D" + str(i) + ":" + "F" + str(j))
print("Q" + str(i) + ":" + "S" + str(j))
print("AB" + str(i) + ":" + "AD" + str(j))
Alternatives
Pandas using loc
This is actually the simplest way to solve it but only now it occurred to me. We use the column name col along with loc to get all rows (given by : in loc[:, col])
cols = ['biz','seg','PID']
for col in cols:
i, j = df1.loc[:, col].values
Attrgetter
We can use the attrgetter object from operator library to get a single (or as many attributes) as we want:
from operator import attrgetter
cols = ['biz','seg','PID']
cols = attrgetter(*cols)(df1)
for col in cols:
i, j = col.values
Attrgetter 2
This approach is similar to the one above, except that we select multiple columns and have the i and j in two lists, with each entry corresponding to one column.
from operator import attrgetter
cols = ['biz','seg','PID']
cols = attrgetter(*cols)(df1)
cols = [col.values for col in cols]
all_i, all_j = zip(*cols)
Pandas solution
This approach uses just pandas functions. It gets the column index using the df1.columns.get_loc(col_name) function, and then uses .iloc to index the values. In .iloc[a,b] we use : in place of a to select all rows, and index in place of b to select just the column.
cols = ['biz','seg','PID']
for col in cols:
index = df1.columns.get_loc(col)
i, j = df1.iloc[:, index]
# do the printing here
for i in range(len(df1)):
print('D' + str(df1.iloc[i,0]) + ':' + 'F' + str(df1.iloc[i+1,0]))
print('Q' + str(df1.iloc[i,0]) + ':' + 'S' + str(df1.iloc[i+1,0]))
print('AB' + str(df1.iloc[i,0]) + ':' + 'AD' + str(df1.iloc[i+1,0]))
print('D' + str(df1.iloc[i,1]) + ':' + 'F' + str(df1.iloc[i+1,1]))
print('Q' + str(df1.iloc[i,1]) + ':' + 'S' + str(df1.iloc[i+1,1]))
print('AB' + str(df1.iloc[i,1]) + ':' + 'AD' + str(df1.iloc[i+1,1]))
print('D' + str(df1.iloc[i,2]) + ':' + 'F' + str(df1.iloc[i+1,2]))
print('Q' + str(df1.iloc[i,2]) + ':' + 'S' + str(df1.iloc[i+1,2]))
print('AB' + str(df1.iloc[i,2]) + ':' + 'AD' + str(df1.iloc[i+1,2]))
I have two data frame like below one is df and another one is anomalies:-
d = {'10028': [0], '1058': [25], '20120': [29], '20121': [22],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
df = pd.DataFrame(data=d)
Basically anomalies in a mirror copy of df just in anomalies the value will be 0 or 1 which indicates anomalies where value is 1 and non-anomaly where value is 0
d = {'10028': [0], '1058': [1], '20120': [1], '20121': [0],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
anomalies = pd.DataFrame(data=d)
and I am converting that into a specific format with the below code:-
details = (
'\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' +
'\n' + '10028:' + '\t' + str(df.tail(1)['10028'][0]) + '\t' + str(anomalies['10028'][0]) +
'\n' + '1058:' + '\t' + '\t' + str(df.tail(1)['1058'][0]) + '\t' + str(anomalies['1058'][0]) +
'\n' + '20120:' + '\t' + str(df.tail(1)['20120'][0]) + '\t' + str(anomalies['20120'][0]) +
'\n' + '20121:' + '\t' + str(round(df.tail(1)['20121'][0], 2)) + '\t' + str(anomalies['20121'][0]) +
'\n' + '20122:' + '\t' + str(round(df.tail(1)['20122'][0], 2)) + '\t' + str(anomalies['20122'][0]) +
'\n' + '20123:' + '\t' + str(round(df.tail(1)['20123'][0], 3)) + '\t' + str(anomalies['20123'][0]) +
'\n' + '5043:' + '\t' + str(round(df.tail(1)['5043'][0], 3)) + '\t' + str(anomalies['5043'][0]) +
'\n' + '5046:' + '\t' + str(round(df.tail(1)['5046'][0], 3)) + '\t' + str(anomalies['5046'][0]) +
'\n\n' + 'message:' + '\t' +
'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
)
The problem is the column values are changing always in every run I mean like in this run its '10028', '1058', '20120', '20121', '20122', '20123', '5043', '5046' but maybe in next run it will be '10029', '1038', '20121', '20122', '20123', '5083', '5946'
How I can create the details dynamically depending on what columns are present in the data frame as I don't want to hard code and in the message i want to pass the name of columns whose value is 1.
The value of columns will always be either 1 or 0.
Try this:
# first part of the string
s = '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly'
# dynamically add the data
for idx, val in df.iloc[-1].iteritems():
s += f'\n{idx}\t{val}\t{anomalies[idx][0]}'
# for Python 3.5 and below, use this
# s += '\n{}\t{}\t{}'.format(idx, val, anomalies[idx][0])
# last part
s += ('\n\n' + 'message:' + '\t' +
'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
)
How can I treat numbers as symbols in SymPy?
For example, if I am performing a factorization with symbols I get:
from sympy import factor
factor('a*c*d + a*c*e + a*c*f + b*c*d + b*c*e + b*c*f')
c*(a + b)*(d + e + f)
I would like the same behaviour when I am using numbers in the expression.
Instead of
from sympy import factor
factor('2006*c*d + 2006*c*e + 2006*c*f + 2007*c*d + 2007*c*e + 2007*c*f')
4013*c*(d + e + f)
I would like to get
from sympy import factor
factor('2006*c*d + 2006*c*e + 2006*c*f + 2007*c*d + 2007*c*e + 2007*c*f')
c*(2006 + 2007)*(d + e + f)
Replace each constant with a unique symbol.
Factor the resulting expression.
Replace the unique symbols with the constants.
For your given case, something like this:
simple = factor('const2006*c*d + const2006*c*e + const2006*c*f + const2007*c*d + const2007*c*e + const2007*c*f')
simple.replace("const", '')
print(simple)
This should give you the desired output. You can identify numeric tokens in the expression with a straightforward regex or trivial parser -- either of which is covered in many other locations.
Symbol trickery to the rescue: replace your numbers with Symbols having a name given by the number. In your case you don't have to watch for negative versions so the following is straightforward:
>>> s = '2006*c*d + 2006*c*e + 2006*c*f + 2007*c*d + 2007*c*e + 2007*c*f'
>>> eq = S(s, evaluate=False); eq
2006*c*d + 2007*c*d + 2006*c*e + 2007*c*e + 2006*c*f + 2007*c*f
>>> reps = dict([(i,Symbol(str(i))) for i in _.atoms(Integer)]); reps
{2006: 2006, 2007: 2007}
>>> factor(eq.subs(reps))
c*(2006 + 2007)*(d + e + f)
Note: the evaluate=False is used to keep the like-terms from combining to give 4013*c*d + 4013*c*e + 4013*c*f.
Text File explained:
Joe,Bloggs,J.bloggs#anemailaddress.com,01269512355, 1 ,0, 0, 0, 0
Fname, Lname, Email, number, Value i want checking ^ , ...,...,...,...
Objective: The check that the number in value[4] of the key is not 0 or 7.
If it is 0 then it changes to 1, and if it is 7 then it changes to 6.
So if it's 0 then + 1, if it's 7 then -1.
Text file:
Joe,Bloggs,J.bloggs#anemailaddress.com,01269512355, 1,0, 0, 0, 0
Sarah,Brown,S.brown#anemailaddress.com,01866522555, 5,0, 0, 0, 0
Andrew,Smith,A.smith#anemailaddress.com,01899512785, 7,0, 0, 0, 0
Kevin,White,K.white#anemailaddress.com,01579122345, 0,0, 0, 0, 0
Samantha,Collins,S.collins#anemailaddress.com,04269916257, 0,0, 0, 0, 0
After the code has run, it should look like this:
Joe,Bloggs,J.bloggs#anemailaddress.com,01269512355, 1,0, 0, 0, 0
Sarah,Brown,S.brown#anemailaddress.com,01866522555, 5,0, 0, 0, 0
Andrew,Smith,A.smith#anemailaddress.com,01899512785, 6,0, 0, 0, 0
Kevin,White,K.white#anemailaddress.com,01579122345, 1,0, 0, 0, 0
Samantha,Collins,S.collins#anemailaddress.com,04269916257, 1,0, 0, 0, 0
The code i have so far produces an error:
fileinfo[j] = i[0] + ',' + i[1] + ',' + i[2] + ',' + i[3] + ',' + str(value) + ',' + i[5] + ',' + i[6] + ',' + i[7]+ ',' + i[8] + '\n'
IndexError: list assignment index out of range
Code:
f = open ("players.txt","r")
fileinfo = f.readlines()
f.close()
j = 0
for i in fileinfo:
i = i.strip()
i = i.split(",")
value = int(i[4])
if value == "0":
value = value + 1
fileinfo[j] = i[0] + ',' + i[1] + ',' + i[2] + ',' + i[3] + ',' + str(value) + ',' + i[5] + ',' + i[6] + ',' + i[7]+ ',' + i[8] + '\n'
j = j + 1
if value == "7":
value = value - 1
fileinfo[j] = i[0] + ',' + i[1] + ',' + i[2] + ',' + i[3] + ',' + str(value) + ',' + i[5] + ',' + i[6] + ',' + i[7]+ ',' + i[8] + '\n'
j = j + 1
f = open ("players.txt","w")
for i in fileinfo:
f.write(i)
f.close()
This is probably a very complex way of doing what i want to do. Please can you help me with my objective. Feel free to rewrite my entire code, but can you explain in detail what you have done. I am quite new to coding.
For future readers
There are two answers that work. I can only tick one of them. Hope this question helps future readers as it has done me.
The problems with your code are as follows:
You should remove the whitespace around the number before converting it to an int:
value = int(i[4]) would crash if there were whitespace around the number. Use value = int(i[4].strip()) to fix this.
You're converting the value to an integer, then comparing that integer to a string. This will always evaluate to False.
value = int(i[4])
if value == "0":
You're incrementing j twice per loop, which is why your code crashes with an IndexError. I suggest using enumerate instead of manually maintaining j.
The fixed code could look like this:
for j, i in enumerate(fileinfo):
i = i.strip()
i = i.split(",")
value = i[4].strip()
if value == "0":
i[4] = "1"
elif value == "7":
i[4] = "6"
fileinfo[j] = ','.join(i) + '\n'
Here is a simplified version of your code (Totally untested).
with open ("players.txt","r") as f:
parts = [[y.strip() for y in x.split(",")] for x in f if x.strip()]
with open("players.txt", "w") as f:
for part in parts:
#So if it's 0 then + 1, if it's 7 then -1
if part[4] == "0": part[4] = "1"
elif part[4] == "7": part[4] = "6"
print>>f, ",".join(part)
You increment j twice each time through the loop, when it is supposed to track the position of each line, and thus it gets too big and causes the error.
Here is a corrected version of your code (explanation below). I tried to make as little changes as possible.
f = open("players.txt","r")
fileinfo = f.readlines()
f.close()
for i in enumerate(fileinfo):
i[1] = i[1].strip()
i[1] = i[1].split(",")
#Make sure we don't go out of range, if so, skip to the next line.
if len(i[1]) < 5:
continue
value = int(i[1][4].strip())
if value == 0:
value += 1
elif value == 7:
value -= 1
i[1][4] = str(value)
fileinfo[i[0]] = ",".join(i)
f = open("players.txt","w")
f.write("\n".join(fileinfo))
f.close()
value is an int, but you were comparing it against strings.
j was not needed and I replaced it with the enumerate function. This will create pairs of [index,value] for you to use.
You were checking value twice and setting i twice. I corrected this by cleaning up the if blocks and utilizing elif.
The .join method saves you a lot of work. It creates a string from a list with a deliminator. ' '.join(['Hello', 'world!']) becomes Hello world!.
If there are any lines which are too short, it would error (a blank line for example). I added a check which will skip that line in such a scenario by utilizing continue.
You should strip the whitespace around the element before attempting to convert to int.
When adding to a value, it's often easier to utilize += and -=.
This is what Scott Hunter meant:
j = 0
for i in fileinfo:
# Do something on fileinfo[j]
j = j + 1
# Do something on fileinfo[j]
j = j + 1
Let's say you have three lines in fileinfo: ['aaa', 'bbb', 'ccc']
On first loop iteration you have i = 'aaa' and j = 0. fileinfo[0] points to the first line.
On second iteration i = 'bbb' and j = 2. fileinfo[2] points to the third line.
On the third iteration i = 'ccc'and j = 4. You try to read the fifth line (fileinfo[4]) which does not exists.
Only increment the value of j once per loop iteration and i/j will stay in sync.
Scott Hunter is write about you erroneously incrementing j twice in the loop. Also you would probably also want to write the whole thing a little more readable, such as:
filename = 'players.txt'
with open(filename, 'r') as f:
lines = f.readlines()
output = []
for line in lines:
split_line = line.split(',')
split_line = map(str.strip, split_line)
num = split_line[4]
if num == '0':
split_line[4] = '1'
if num == '7':
split_line[4] = '6'
output.append(','.join(split_line))
with open(filename, 'w') as f:
f.write('\n'.join(output))
assuming you dont care for whitespace between the commas.
I create a dictionary for the most used words and get the top ten. I need to sort this for the list, which should be in order. I can't do that without making a list, which I can't use. Here is my code. I am away dictionaries cannot be sorted, but i still need help.
most_used_words = Counter()
zewDict = Counter(most_used_words).most_common(10)
newDict = dict(zewDict)
keys = newDict.keys()
values = newDict.values()
msg = ('Here is your breakdown of your most used words: \n\n'
'Word | Times Used'
'\n:--:|:--:'
'\n' + str(keys[0]).capitalize() + '|' + str(values[0]) +
'\n' + str(keys[1]).capitalize() + '|' + str(values[1]) +
'\n' + str(keys[2]).capitalize() + '|' + str(values[2]) +
'\n' + str(keys[3]).capitalize() + '|' + str(values[3]) +
'\n' + str(keys[4]).capitalize() + '|' + str(values[4]) +
'\n' + str(keys[5]).capitalize() + '|' + str(values[5]) +
'\n' + str(keys[6]).capitalize() + '|' + str(values[6]) +
'\n' + str(keys[7]).capitalize() + '|' + str(values[7]) +
'\n' + str(keys[8]).capitalize() + '|' + str(values[8]) +
'\n' + str(keys[9]).capitalize() + '|' + str(values[9]))
r.send_message(user, 'Most Used Words', msg)
How would I do it so the msg prints the words in order from most used word on the top to least on the bottom with the correct values for the word?
Edit: I know dictionaries cannot be sorted on their own, so can I work around this somehow?
Once you have the values it's as simple as:
print('Word | Times Used')
for e, t in collections.Counter(values).most_common(10):
print("%s|%d" % (e,t))
Print something like:
Word | Times Used
e|4
d|3
a|2
c|2
From the Docs: most_common([n])
Return a list of the n most common elements and their counts from the
most common to the least. If n is not specified, most_common() returns
all elements in the counter. Elements with equal counts are ordered
arbitrarily:
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
Your code can be:
from collections import Counter
c = Counter(most_used_words)
msg = "Here is your breakdown of your most used words:\n\nWords | Times Used\n:--:|:--:\n"
msg += '\n'.join('%s|%s' % (k.capitalize(), v) for (k, v) in c.most_common(10))
r.send_message(user, 'Most Used Words', msg)
import operator
newDict = dict(zewDict)
sorted_newDict = sorted(newDict.iteritems(), key=operator.itemgetter(1))
msg = ''
for key, value in sorted_newDict:
msg.append('\n' + str(key).capitalize() + '|' + str(value))
This will sort by the dictionary values. If you want it in the other order add reverse=True to sorted().