Error when accessing pandas series by index - python

I have a piece of code:
texts = []
for i in range(len(df.index)):
_text = ''
for c in df.columns:
_text += c + ':' + str(df[c][i]) + '<br>'
texts.append (_text)
and I have the error of ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> at the line _text += c + ':' + str(df[c][i]) + '<br>'.
I changed the [i] to all kinds of accessing functions I know: at, iat, loc, iloc but none of them works, they still have the same problem.

This works for me: i inserted iloc[i] in line 5:
texts = []
for i in range(len(df.index)):
_text = ''
for c in df.columns:
_text += c + ':' + str(df[c].iloc[i]) + '<br>'
texts.append (_text)

Related

pandas iterate over column values at once and generate range

I have a pandas dataframe like as below
df1 = pd.DataFrame({'biz': [18, 23], 'seg': [30, 34], 'PID': [40, 52]})
I would like to do the below
a) pass all the values from each column at once to for loop
For ex:
I am trying the below
cols = ['biz','seg','PID']
for col in cols:
for i, j in df1.col.values:
print("D" + str(i) + ":" + "F" + str(j))
print("Q" + str(i) + ":" + "S" + str(j))
print("AB" + str(i) + ":" + "AD" + str(j))
but this doesn;t work and I get an error
TypeError: cannot unpack non-iterable numpy.int64 object
I expect my output to be like as below
D18:F23
Q18:S23
AB18:AD23
D30:F34
Q30:S34
AB30:AD34
D40:F52
Q40:S52
AB40:AD52
The mistake is in the innermost forloop.
You are requesting an iterator over a 1-dimensional array of values, this iterator yields scalar values and hence they can not be unpacked.
If your dataframe only has 2 items per column, then this should suffice
cols = ['biz','seg','PID']
for col in cols:
i, j = getattr(df1, col).values
print("D" + str(i) + ":" + "F" + str(j))
print("Q" + str(i) + ":" + "S" + str(j))
print("AB" + str(i) + ":" + "AD" + str(j))
Alternatives
Pandas using loc
This is actually the simplest way to solve it but only now it occurred to me. We use the column name col along with loc to get all rows (given by : in loc[:, col])
cols = ['biz','seg','PID']
for col in cols:
i, j = df1.loc[:, col].values
Attrgetter
We can use the attrgetter object from operator library to get a single (or as many attributes) as we want:
from operator import attrgetter
cols = ['biz','seg','PID']
cols = attrgetter(*cols)(df1)
for col in cols:
i, j = col.values
Attrgetter 2
This approach is similar to the one above, except that we select multiple columns and have the i and j in two lists, with each entry corresponding to one column.
from operator import attrgetter
cols = ['biz','seg','PID']
cols = attrgetter(*cols)(df1)
cols = [col.values for col in cols]
all_i, all_j = zip(*cols)
Pandas solution
This approach uses just pandas functions. It gets the column index using the df1.columns.get_loc(col_name) function, and then uses .iloc to index the values. In .iloc[a,b] we use : in place of a to select all rows, and index in place of b to select just the column.
cols = ['biz','seg','PID']
for col in cols:
index = df1.columns.get_loc(col)
i, j = df1.iloc[:, index]
# do the printing here
for i in range(len(df1)):
print('D' + str(df1.iloc[i,0]) + ':' + 'F' + str(df1.iloc[i+1,0]))
print('Q' + str(df1.iloc[i,0]) + ':' + 'S' + str(df1.iloc[i+1,0]))
print('AB' + str(df1.iloc[i,0]) + ':' + 'AD' + str(df1.iloc[i+1,0]))
print('D' + str(df1.iloc[i,1]) + ':' + 'F' + str(df1.iloc[i+1,1]))
print('Q' + str(df1.iloc[i,1]) + ':' + 'S' + str(df1.iloc[i+1,1]))
print('AB' + str(df1.iloc[i,1]) + ':' + 'AD' + str(df1.iloc[i+1,1]))
print('D' + str(df1.iloc[i,2]) + ':' + 'F' + str(df1.iloc[i+1,2]))
print('Q' + str(df1.iloc[i,2]) + ':' + 'S' + str(df1.iloc[i+1,2]))
print('AB' + str(df1.iloc[i,2]) + ':' + 'AD' + str(df1.iloc[i+1,2]))

Dynamically create string from pandas column

I have two data frame like below one is df and another one is anomalies:-
d = {'10028': [0], '1058': [25], '20120': [29], '20121': [22],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
df = pd.DataFrame(data=d)
Basically anomalies in a mirror copy of df just in anomalies the value will be 0 or 1 which indicates anomalies where value is 1 and non-anomaly where value is 0
d = {'10028': [0], '1058': [1], '20120': [1], '20121': [0],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
anomalies = pd.DataFrame(data=d)
and I am converting that into a specific format with the below code:-
details = (
'\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' +
'\n' + '10028:' + '\t' + str(df.tail(1)['10028'][0]) + '\t' + str(anomalies['10028'][0]) +
'\n' + '1058:' + '\t' + '\t' + str(df.tail(1)['1058'][0]) + '\t' + str(anomalies['1058'][0]) +
'\n' + '20120:' + '\t' + str(df.tail(1)['20120'][0]) + '\t' + str(anomalies['20120'][0]) +
'\n' + '20121:' + '\t' + str(round(df.tail(1)['20121'][0], 2)) + '\t' + str(anomalies['20121'][0]) +
'\n' + '20122:' + '\t' + str(round(df.tail(1)['20122'][0], 2)) + '\t' + str(anomalies['20122'][0]) +
'\n' + '20123:' + '\t' + str(round(df.tail(1)['20123'][0], 3)) + '\t' + str(anomalies['20123'][0]) +
'\n' + '5043:' + '\t' + str(round(df.tail(1)['5043'][0], 3)) + '\t' + str(anomalies['5043'][0]) +
'\n' + '5046:' + '\t' + str(round(df.tail(1)['5046'][0], 3)) + '\t' + str(anomalies['5046'][0]) +
'\n\n' + 'message:' + '\t' +
'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
)
The problem is the column values are changing always in every run I mean like in this run its '10028', '1058', '20120', '20121', '20122', '20123', '5043', '5046' but maybe in next run it will be '10029', '1038', '20121', '20122', '20123', '5083', '5946'
How I can create the details dynamically depending on what columns are present in the data frame as I don't want to hard code and in the message i want to pass the name of columns whose value is 1.
The value of columns will always be either 1 or 0.
Try this:
# first part of the string
s = '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly'
# dynamically add the data
for idx, val in df.iloc[-1].iteritems():
s += f'\n{idx}\t{val}\t{anomalies[idx][0]}'
# for Python 3.5 and below, use this
# s += '\n{}\t{}\t{}'.format(idx, val, anomalies[idx][0])
# last part
s += ('\n\n' + 'message:' + '\t' +
'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
)

How to check and change a value in a Text file? (Python 2.7.11)

Text File explained:
Joe,Bloggs,J.bloggs#anemailaddress.com,01269512355, 1 ,0, 0, 0, 0
Fname, Lname, Email, number, Value i want checking ^ , ...,...,...,...
Objective: The check that the number in value[4] of the key is not 0 or 7.
If it is 0 then it changes to 1, and if it is 7 then it changes to 6.
So if it's 0 then + 1, if it's 7 then -1.
Text file:
Joe,Bloggs,J.bloggs#anemailaddress.com,01269512355, 1,0, 0, 0, 0
Sarah,Brown,S.brown#anemailaddress.com,01866522555, 5,0, 0, 0, 0
Andrew,Smith,A.smith#anemailaddress.com,01899512785, 7,0, 0, 0, 0
Kevin,White,K.white#anemailaddress.com,01579122345, 0,0, 0, 0, 0
Samantha,Collins,S.collins#anemailaddress.com,04269916257, 0,0, 0, 0, 0
After the code has run, it should look like this:
Joe,Bloggs,J.bloggs#anemailaddress.com,01269512355, 1,0, 0, 0, 0
Sarah,Brown,S.brown#anemailaddress.com,01866522555, 5,0, 0, 0, 0
Andrew,Smith,A.smith#anemailaddress.com,01899512785, 6,0, 0, 0, 0
Kevin,White,K.white#anemailaddress.com,01579122345, 1,0, 0, 0, 0
Samantha,Collins,S.collins#anemailaddress.com,04269916257, 1,0, 0, 0, 0
The code i have so far produces an error:
fileinfo[j] = i[0] + ',' + i[1] + ',' + i[2] + ',' + i[3] + ',' + str(value) + ',' + i[5] + ',' + i[6] + ',' + i[7]+ ',' + i[8] + '\n'
IndexError: list assignment index out of range
Code:
f = open ("players.txt","r")
fileinfo = f.readlines()
f.close()
j = 0
for i in fileinfo:
i = i.strip()
i = i.split(",")
value = int(i[4])
if value == "0":
value = value + 1
fileinfo[j] = i[0] + ',' + i[1] + ',' + i[2] + ',' + i[3] + ',' + str(value) + ',' + i[5] + ',' + i[6] + ',' + i[7]+ ',' + i[8] + '\n'
j = j + 1
if value == "7":
value = value - 1
fileinfo[j] = i[0] + ',' + i[1] + ',' + i[2] + ',' + i[3] + ',' + str(value) + ',' + i[5] + ',' + i[6] + ',' + i[7]+ ',' + i[8] + '\n'
j = j + 1
f = open ("players.txt","w")
for i in fileinfo:
f.write(i)
f.close()
This is probably a very complex way of doing what i want to do. Please can you help me with my objective. Feel free to rewrite my entire code, but can you explain in detail what you have done. I am quite new to coding.
For future readers
There are two answers that work. I can only tick one of them. Hope this question helps future readers as it has done me.
The problems with your code are as follows:
You should remove the whitespace around the number before converting it to an int:
value = int(i[4]) would crash if there were whitespace around the number. Use value = int(i[4].strip()) to fix this.
You're converting the value to an integer, then comparing that integer to a string. This will always evaluate to False.
value = int(i[4])
if value == "0":
You're incrementing j twice per loop, which is why your code crashes with an IndexError. I suggest using enumerate instead of manually maintaining j.
The fixed code could look like this:
for j, i in enumerate(fileinfo):
i = i.strip()
i = i.split(",")
value = i[4].strip()
if value == "0":
i[4] = "1"
elif value == "7":
i[4] = "6"
fileinfo[j] = ','.join(i) + '\n'
Here is a simplified version of your code (Totally untested).
with open ("players.txt","r") as f:
parts = [[y.strip() for y in x.split(",")] for x in f if x.strip()]
with open("players.txt", "w") as f:
for part in parts:
#So if it's 0 then + 1, if it's 7 then -1
if part[4] == "0": part[4] = "1"
elif part[4] == "7": part[4] = "6"
print>>f, ",".join(part)
You increment j twice each time through the loop, when it is supposed to track the position of each line, and thus it gets too big and causes the error.
Here is a corrected version of your code (explanation below). I tried to make as little changes as possible.
f = open("players.txt","r")
fileinfo = f.readlines()
f.close()
for i in enumerate(fileinfo):
i[1] = i[1].strip()
i[1] = i[1].split(",")
#Make sure we don't go out of range, if so, skip to the next line.
if len(i[1]) < 5:
continue
value = int(i[1][4].strip())
if value == 0:
value += 1
elif value == 7:
value -= 1
i[1][4] = str(value)
fileinfo[i[0]] = ",".join(i)
f = open("players.txt","w")
f.write("\n".join(fileinfo))
f.close()
value is an int, but you were comparing it against strings.
j was not needed and I replaced it with the enumerate function. This will create pairs of [index,value] for you to use.
You were checking value twice and setting i twice. I corrected this by cleaning up the if blocks and utilizing elif.
The .join method saves you a lot of work. It creates a string from a list with a deliminator. ' '.join(['Hello', 'world!']) becomes Hello world!.
If there are any lines which are too short, it would error (a blank line for example). I added a check which will skip that line in such a scenario by utilizing continue.
You should strip the whitespace around the element before attempting to convert to int.
When adding to a value, it's often easier to utilize += and -=.
This is what Scott Hunter meant:
j = 0
for i in fileinfo:
# Do something on fileinfo[j]
j = j + 1
# Do something on fileinfo[j]
j = j + 1
Let's say you have three lines in fileinfo: ['aaa', 'bbb', 'ccc']
On first loop iteration you have i = 'aaa' and j = 0. fileinfo[0] points to the first line.
On second iteration i = 'bbb' and j = 2. fileinfo[2] points to the third line.
On the third iteration i = 'ccc'and j = 4. You try to read the fifth line (fileinfo[4]) which does not exists.
Only increment the value of j once per loop iteration and i/j will stay in sync.
Scott Hunter is write about you erroneously incrementing j twice in the loop. Also you would probably also want to write the whole thing a little more readable, such as:
filename = 'players.txt'
with open(filename, 'r') as f:
lines = f.readlines()
output = []
for line in lines:
split_line = line.split(',')
split_line = map(str.strip, split_line)
num = split_line[4]
if num == '0':
split_line[4] = '1'
if num == '7':
split_line[4] = '6'
output.append(','.join(split_line))
with open(filename, 'w') as f:
f.write('\n'.join(output))
assuming you dont care for whitespace between the commas.

Python: Redundancy when iterating through nested dictionary

I have a nested dictionary which I am trying to loop through in order to write to an excel file.
This is the code which initiates and creates the nested dictionary
def tree(): return defaultdict(tree)
KMstruct = tree()
for system in sheet.columns[0]:
if system.value not in KMstruct:
KMstruct[system.value]
for row in range(1,sheet.get_highest_row()+1):
if sheet['A'+str(row)].value == system.value and sheet['B'+str(row)].value not in KMstruct:
KMstruct[system.value][sheet['B'+str(row)].value]
if sheet['B'+str(row)].value == sheet['B'+str(row)].value and sheet['C'+str(row)].value not in KMstruct:
KMstruct[system.value][sheet['B'+str(row)].value][sheet['C'+str(row)].value]
if sheet['C'+str(row)].value == sheet['C'+str(row)].value and sheet['D'+str(row)].value not in KMstruct:
KMstruct[system.value][sheet['B'+str(row)].value][sheet['C'+str(row)].value][sheet['D'+str(row)].value]
KMstruct[system.value][sheet['B'+str(row)].value][sheet['C'+str(row)].value][sheet['D'+str(row)].value] = [sheet['E'+str(row)].value]
This is the code where I loop through it:
for key in KMstruct.keys():
r += 1
worksheet.write(r, col, key)
for subkey in KMstruct[key]:
if currsubkeyval != subkey:
r += 1
worksheet.write(r, col, key)
r +=1
worksheet.write(r, col, key + '\\' + subkey)
for item in KMstruct[key][subkey]:
if curritemval != item:
r +=1
worksheet.write(r, col, key + '\\' + subkey)
for subitem in KMstruct[key][subkey][item]:
r += 1
worksheet.write(r, col, key + '\\' + subkey + '\\' + item)
worksheet.write(r, col + 1, subitem)
curritemval = item
for finalitem in KMstruct[key][subkey][item][subitem]:
r += 1
worksheet.write(r, col, key + '\\' + subkey + '\\' + item + '\\' + subitem)
worksheet.write(r, col + 1, KMstruct[key][subkey][item][subitem])
Bear with me on this code since I am a noob, I am aware that this is not so beautiful. Anyhow, my problem is the last loop. I am trying to take the string values in KMstruct[key][subkey][item][subitem] but the loop variable lastitem goes through each char of key's string value (note: the key subitem contains a list of strings). This means that if I only have one value that is to be written, it gets written as many times as there are characters in the string.
E.g.: value: apple will be written on a new excel row 5 times
What am I doing wrong here?
Edit: The issue on the redundancy has been solved but now I need to understand if I am doing something wrong when assigning my lastitem, i.e. my list of strings, to the subitem key.
The problem is that in Python a str is also an iterable, for example:
>>> for s in 'hello':
... print(s)
h
e
l
l
o
So you either need to avoid iterating when the value is a str, or wrap the str in another iterable (e.g. a list) so that it can be handled the same way. This is made slightly difficult by how you are structuring your code. For example, the following:
for key in KMstruct.keys():
for subkey in KMstruct[key]:
...can be written as:
for key, value in KMstruct.items():
for subkey, subvalue in value.items():
...giving you the value on each loop, making it possible to apply a test.
for key, val in KMstruct.items():
r += 1
worksheet.write(r, col, key)
for subkey, subval in val.items():
if currsubkeyval != subkey:
r += 1
worksheet.write(r, col, key)
r +=1
worksheet.write(r, col, key + '\\' + subkey)
for item, itemval in subval.items():
if curritemval != item:
r +=1
worksheet.write(r, col, key + '\\' + subkey)
for subitem, subitemval in itemval.items():
r += 1
worksheet.write(r, col, key + '\\' + subkey + '\\' + item)
worksheet.write(r, col + 1, subitem)
curritemval = item
# I am assuming at this point subitemval is either a list or a str?
if type(subitemval) == str:
# If it is, wrap it as a list, so the next block works as expected
subitemval = [subitemval]
for finalitem in subitemval:
r += 1
worksheet.write(r, col, key + '\\' + subkey + '\\' + item + '\\' + subitem)
worksheet.write(r, col + 1, finalitem) # This should be finalitem, correct?
Here we test the type of the subitemval and if it is a str wrap it in a list [str] so the following block iterates as expected. There is also an apparent bug where you are not outputting the finalitem on the last line. It's not posssible to test this without your example data but it should be functionally equivalent.
Your immediate problem is that the last line of your example should be:
worksheet.write(r,col+1, finalitem)
BTW, your code would be a lot easier to read if you created occasional temporary variables like:
subitemlist = KMstruct[key][subkey][item]
for subitem in subitemlist:

Write multiple columns on .xls through Python 2.7

out_file.write('Position'+'\t'+'Hydrophobic'+'\n')
for i in position:
out_file.write(str(i)+'\n')
for j in value:
out_file.write('\t'+str(j)+'\n')
so it says
Position Hydrophobic
0 a
1 b
2 c
#... and so on
when it writes to the excel file it puts it puts the j from value at the bottom of the column of the i in position column
How do i put them side by side with '\t' and '\n'?
for i, j in zip(position, value):
out_file.write(str(i) + '\t' + str(j) + '\n')
or better:
out_file.write('%s\t%s\n' % (str(i), str(j))
or better:
text = ['%s\t%s' % (str(i), str(j) for i, j in zip(position, value)]
text = '\n'.join(text)
out_file.write(text)

Categories