AttributeError: 'str' object has no attribute 'size' - python

This is kind of related to my previous question, unanswered, from here:Inserting random values based on condition
for row in range(len(df)):
while df["Full_Input_Length"][row] >= 55:
df["Input3"][row] = np.random.choice(sentence_list, size=len(df))
df["Full_Input"][row] = np.array2string(df['TitleTag'][row]).replace("'","") + " " + np.array2string(df['Input2'][row]).replace("'","") + " " + np.array2string(df['Input3'][row]).replace("'","") + " " + np.array2string(df['Input4'][row]).replace("'","") + " " + np.array2string(df['Input5'][row]).replace("'","")
df["Full_Input_Length"][row] = len(df["Full_Input"][row])
break
I stuck to my initial plan and continued my attempts to write down this for loop properly. The error I am getting is on line 4, where it says
AttributeError: 'str' object has no attribute 'size'
Basically I am trying to concatenate multiple strings into column Full_Input. I would assume that now the problem is at df["Full_Input"][row] but I am not quite sure how to write it properly so my code would run. I tried different ways but none worked - other errors popped up.
What am I getting wrong? Is there any other way of doing this?
Full code:
df['TitleTag'] = np.nan
for col in range(len(df)):
condlist = [df['BrandId'][col] != 2, df['BrandId'][col] == 2]
choicelist = [df['Name'][col] + ' fra ' + df['BrandName'][col], df['Name'][col]]
df['TitleTag'][col] = np.select(condlist, choicelist)
df['Input1'] = np.nan
for col in range(len(df)):
condlist = [df['BrandId'][col] != 2, df['BrandId'][col] == 2]
choicelist = [df['Name'][col] + ' fra ' + df['BrandName'][col], np.nan]
df['Input1'][col] = np.select(condlist, choicelist)
symbol_list = (['-','=>','|','->'])
df["Input2"] = np.random.choice(symbol_list, size=len(df))
sentence_list = (['Køb online her','Sammenlign priser her','Tjek priser fra 4 butikker','Se produkter fra 4 butikker', 'Stort udvalg fra 4 butikker','Sammenlign og køb'])
df["Input3"] = np.random.choice(sentence_list, size=len(df))
symbol_list2 = (['-','|'])
df["Input4"] = np.random.choice(symbol_list2, size=len(df))
df["Input5"] = "Site.dk"
df["Full_Input"] = df['TitleTag'].astype(str) + " " + df['Input2'].astype(str) + " " + df['Input3'].astype(str) + " " + df['Input4'].astype(str) + " " + df['Input5'].astype(str)
df["Full_Input_Length"] = df["Full_Input"].apply(len)
for col in range(len(df)):
while df["Full_Input_Length"][col] >= 55:
df["Input3"][col] = np.random.choice(sentence_list, size=len(df))
df["Full_Input"][col] = np.array2string(df['TitleTag'][col]).replace("'","") + " " + np.array2string(df['Input2'][col]).replace("'","") + " " + np.array2string(df['Input3'][col]).replace("'","") + " " + np.array2string(df['Input4'][col]).replace("'","") + " " + np.array2string(df['Input5'][col]).replace("'","")
df["Full_Input_Length"][col] = len(df["Full_Input"][col])
break

Related

AttributeError: when change font colour in xlwings python

I want to change font colour if before row of value is same with next cell in xlwings.I try with "worksheet.range(changeFontCell).api.font.Color = rgb_to_int((176,176,176))" to change font.
But i get error "AttributeError: '<win32com.gen_py.Microsoft Excel 16.0 Object Library.Range instance at 0x2163906949584>' object has no attribute 'font'"
Below is my source code. How can i solve that error
lastRow = worksheet.range('A' + str(worksheet.cells.last_cell.row)).end('up').row + 1
startNewRow = 'A' + str(lastRow)
worksheet.range(startNewRow).value = finalResult
finalLastRow = worksheet.range('A' + str(worksheet.cells.last_cell.row)).end('up').row
for i in list(range(lastRow, finalLastRow)):
if worksheet.range('A' + str(i)).value == worksheet.range('A' + str(i+1)).value:
changeFontCellColour = 'A' + str(i+1)
worksheet.range(changeFontCellColour ).api.font.Color = rgb_to_int((176,176,176))
else:
continue
Any advice helps! Thank you
Now i solve my problem with below source.
lastRow = worksheet.range('A' + str(worksheet.cells.last_cell.row)).end('up').row + 1
startNewRow = 'A' + str(lastRow)
worksheet.range(startNewRow).value = finalResult
finalLastRow = worksheet.range('A' + str(worksheet.cells.last_cell.row)).end('up').row
for i in list(range(lastRow, finalLastRow)):
if worksheet.range('A' + str(i)).value == worksheet.range('A' + str(i+1)).value:
changeFontCellColour = 'A' + str(i+1)
**myRange = worksheet.range(changeFontCellColour)**
myRange.api.Font.Color = rgb_to_int((160,160,160))
continue
else:
continue

Need help refining code to run a for loop to summarize economic variables from a csv file?

I have a csv file with a time series of two economic variables (housing starts and Unemployment). I have a list of calculations and a summary (text) that is written with the output of the calculations (basically summarizing in a paragraph format what the trends are of the data). I would like feedback on how i get I get a for loop to go through each variable in the csv file so i have a summary for each variable as the final output.
I tried applying the basic logic of a for loop but I'm just not sure what i have incorrect. I looked at a number of examples on stackoverflow but nothing seems to fit, I'm sure I'm missing something simple but haven't been using python that long so just not sure at this point.
raw_data = pd.read_csv('C:/Users/J042666/Desktop/2019.03 HOUST and GDP.csv')
df = pd.DataFrame(raw_data)
for i in df:
freq = "monthly "
units = " million "
pos = 1
colname = df.columns[pos]
alltime = df.mean()
low = df.min()
maximum = df.max()
today = df.iloc[720]
one_year = df.iloc[709:721].mean()
two_year = df.iloc[697:721].mean()
five_year = df.iloc[661:721].mean()
one_year_vol = df.iloc[709:721].std()
two_year_vol = df.iloc[697:721].std()
five_year_vol = df.iloc[661:721].std()
today_vs_1 = ((today/one_year) -1)*100
today_vs_2 = ((today/two_year) -1)*100
today_vs_5 = ((today/five_year) -1)*100
rolling_1 = df.rolling(window=3).mean()
rolling_2 = df.rolling(window=6).mean()
rolling_3 = df.rolling(window=9).mean()
today_vs_1_rolling = ((today/rolling_1.iloc[720]) -1)*100
today_vs_2_rolling = ((today/rolling_2.iloc[720]) -1)*100
today_vs_3_rolling = ((today/rolling_3.iloc[720]) -1)*100
summary = ("The " + str(freq) + str(colname) + " currently stands at " + str(today) + str(units) + " which compares to the 1,2 and 5 year averages of " + str(one_year) + str(units) + "," + str(two_year) + str(units) + "," + " and " + str(five_year) + str(units) + " respectively. " + " Based on the current " + str(colname) + " levels, that reflects a change of" + str(today_vs_1) + ", " + str(today_vs_2) + " and " + str(today_vs_5) + " respectively." " Since the metric began being tracked, the minimum, maximum and long run average total " + str(low) + str(units) + ", " + str(maximum) + str(units) + " and " + str(alltime) + str(units) + " respectively. " "The 1, 2 and 5 year standard deviation for " + str(colname) + " totals " + str(one_year_vol) + str(units) + " ," + str(two_year_vol) + str(units) + " and" + str(five_year_vol) + str(units) + " respectively." + " Based on the current " + str(colname) + " levels compared to the 3, 6 and 9 month rolling averages, the current level reflects a change of " + str(today_vs_1_rolling) + ", " + str(today_vs_2_rolling) + " and " + str(today_vs_3_rolling) + " respectively.")
print(summary)
As I describe above, I am hoping to have code that produces a paragraph summary of the financial metrics i calculate in the for loop for each variable.
The problem is that you are choosing the entire dataframe rather than each column alone;hence, the analysis you were doing was done for both columns. I also just extracted the values required from your operations rather than keeping the entire text that is printed out from Pandas.
This should work:
df = pd.read_csv('2019.03 HOUST and GDP.csv')
df = df.loc[:, ['Housing Starts', 'Unemployment Rate']]
for idx, col in enumerate(df.columns):
freq = "monthly "
units = " million "
colname = col
selectedCol = df.loc[:, [col]]
alltime = selectedCol.mean()[0]
low = selectedCol.min()[0]
maximum = selectedCol.max()[0]
today = selectedCol.iloc[720][0]
one_year = selectedCol.iloc[709:721].mean()[0]
two_year = selectedCol.iloc[697:721].mean()[0]
five_year = selectedCol.iloc[661:721].mean()[0]
one_year_vol = selectedCol.iloc[709:721].std()[0]
two_year_vol = selectedCol.iloc[697:721].std()[0]
five_year_vol = selectedCol.iloc[661:721].std()[0]
today_vs_1 = ((today/one_year) -1)*100
today_vs_2 = ((today/two_year) -1)*100
today_vs_5 = ((today/five_year) -1)*100
rolling_1 = selectedCol.rolling(window=3).mean()
rolling_2 = selectedCol.rolling(window=6).mean()
rolling_3 = selectedCol.rolling(window=9).mean()
today_vs_1_rolling = ((today/rolling_1.iloc[720]) -1)*100
today_vs_2_rolling = ((today/rolling_2.iloc[720]) -1)*100
today_vs_3_rolling = ((today/rolling_3.iloc[720]) -1)*100
summary = ("The " + str(freq) + str(colname) + " currently stands at " + str(today) + str(units) + " which compares to the 1,2 and 5 year averages of " + str(one_year) + str(units) + "," + str(two_year) + str(units) + "," + " and " + str(five_year) + str(units) + " respectively. " + " Based on the current " + str(colname) + " levels, that reflects a change of" + str(today_vs_1) + ", " + str(today_vs_2) + " and " + str(today_vs_5) + " respectively." " Since the metric began being tracked, the minimum, maximum and long run average total " + str(low) + str(units) + ", " + str(maximum) + str(units) + " and " + str(alltime) + str(units) + " respectively. " "The 1, 2 and 5 year standard deviation for " + str(colname) + " totals " + str(one_year_vol) + str(units) + " ," + str(two_year_vol) + str(units) + " and" + str(five_year_vol) + str(units) + " respectively." + " Based on the current " + str(colname) + " levels compared to the 3, 6 and 9 month rolling averages, the current level reflects a change of " + str(today_vs_1_rolling[0]) + ", " + str(today_vs_2_rolling[0]) + " and " + str(today_vs_3_rolling[0]) + " respectively.")
print(summary)

How to put together datas into a file?

I would like to collect different type of datas into a file. Here is a part of the code.
val = str(float(data[-1]))
val_dB = float(val)
val_dB = math.log(val_dB, 10) * 10
myfile = open('../../../MLI_values/mli_value.txt', 'a')
myfile.write(date_ID + " " + val + val_dB + "\n")
myfile.close()
But it gives back an error:
myfile.write(date_ID + " " + val + val_dB + "\n")
TypeError: cannot concatenate 'str' and 'float' objects
How can I solve it to put them together? (into columns) into a file?
Change:
myfile.write(date_ID + " " + val + val_dB + "\n")
to:
myfile.write(date_ID + " " + val + " " + str(val_dB) + "\n")

Adding the values of two strings using Python and XML path

It generates an output with wallTime and setupwalltime into a dat file, which has the following format:
24000 4 0
81000 17 0
192000 59 0
648000 250 0
1536000 807 0
3000000 2144 0
6591000 5699 0
I would like to know how to add the two values i.e.(wallTime and setupwalltime) together. Can someone give me a hint? I tried converting to float, but it doesn’t seem to work.
import libxml2
import os.path
from numpy import *
from cfs_utils import *
np=[1,2,3,4,5,6,7,8]
n=[20,30,40,60,80,100,130]
solver=["BiCGSTABL_iluk", "BiCGSTABL_saamg", "BiCGSTABL_ssor" , "CG_iluk", "CG_saamg", "CG_ssor" ]# ,"cholmod", "ilu" ]
file_list=["eval_BiCGSTABL_iluk_default", "eval_BiCGSTABL_saamg_default" , "eval_BiCGSTABL_ssor_default" , "eval_CG_iluk_default","eval_CG_saamg_default", "eval_CG_ssor_default" ] # "simp_cholmod_solver_3D_evaluate", "simp_ilu_solver_3D_evaluate" ]
for cnt_np in np:
i=0
for sol in solver:
#open write_file= "Graphs/" + "Np"+ cnt_np + "/CG_iluk.dat"
#"Graphs/Np1/CG_iluk.dat"
write_file = open("Graphs/"+ "Np"+ str(cnt_np) + "/" + sol + ".dat", "w")
print("Reading " + "Graphs/"+ "Np"+ str(cnt_np) + "/" + sol + ".dat"+ "\n")
#loop through different unknowns
for cnt_n in n:
#open file "cfs_calculations_" + cnt_n +"np"+ cnt_np+ "/" + file_list(i) + "_default.info.xml"
read_file = "cfs_calculations_" +str(cnt_n) +"np"+ str(cnt_np) + "/" + file_list[i] + ".info.xml"
print("File list" + file_list[i] + "vlaue of i " + str(i) + "\n")
print("Reading " + " cfs_calculations_" +str(cnt_n) +"np"+ str(cnt_np) + "/" + file_list[i] + ".info.xml" )
#read wall and cpu time and write
if os.path.exists(read_file):
doc = libxml2.parseFile(read_file)
xml = doc.xpathNewContext()
walltime = xpath(xml, "//cfsInfo/sequenceStep/OLAS/mechanic/solver/summary/solve/timer/#wall")
setupwalltime = xpath(xml, "//cfsInfo/sequenceStep/OLAS/mechanic/solver/summary/setup/timer/#wall")
# cputime = xpath(xml, "//cfsInfo/sequenceStep/OLAS/mechanic/solver/summary/solve/timer/#cpu")
# setupcputime = xpath(xml, "//cfsInfo/sequenceStep/OLAS/mechanic/solver/summary/solve/timer/#cpu")
unknowns = 3*cnt_n*cnt_n*cnt_n
write_file.write(str(unknowns) + "\t" + walltime + "\t" + setupwalltime + "\n")
print("Writing_point" + str(unknowns) + "%f" ,float(setupwalltime ) )
doc.freeDoc()
xml.xpathFreeContext()
write_file.close()
i=i+1
In java you can add strings and floats. What I understand is that you need to add the values and then display them. That would work (stringing the sum)
write_file.write(str(unknowns) + "\f" + str(float(walltime) + float(setupwalltime)) + "\n")
You are trying to add a str to a float. That doesn't work. If you want to use string concatenation, first coerce all of the values to str. Try this:
write_file.write(str(unknowns) + "\t" + str(float(walltime) + float(setupwalltime)) + "\n")
Or, perhaps more readably:
totalwalltime = float(walltime) + float(setupwalltime)
write_file.write("{}\t{}\n".format(unknowns, totalwalltime))

How can I improve this messy function or its reiteration?

Very long and annoying question, ill try to do my best to explain it.
I have a file containing data such as
Sarah;Brown;s.brown#gmail.com;0715123451;1;24;0;0
Joe;Blogg;j.bloggs#gmail.com;0749814574;1;60;0;0
Andrew;Smith;a.smith#gmail.com;0718451658;1;45;0;0
Ray;Charles;r.charles#gmail.com;0715451589;1;40;0;0
Kevin;White;k.white#gmail.com;0749858748;1;20;0;0
Samantha;Collins;s.collins#gmail.com;0715243568;1;10;0;0
Frank;Jones;f.jones#gmail.com;0719487516;2;10;0;0
Liam;Blair;l.blair#gmail.com;0729857614;2;4;0;0
Pat;Phillips;p.phillips#gmail.com;071574216;2;17;0;0
John;Brown;j.brown#gmail.com;0798452648;2;11;0;0
Peter;Bond;p.bond#gmail.com;0798415758;6;4;0;0
Edward;Costello;e.costello#gmail.com;0712474588;2;45;0;0
Iain;Wilkins;i.wilkins#gmail.com;0715497211;2;23;0;0
Time;Pratchett;t.pratchett#gmail.com;0784975135;3;48;0;0
Eleanor;House;e.house#gmail.com;0799871542;3;9;0;0
Gergory;Davies;g.davies#gmail.com;0719475847;3;22;0;0
Tina;Turner;t.turner#gmail.com;0749857123;3;17;0;0
Sally;Stevens;s.stevens#gmail.com;077154198;3;30;0;0
John;Lennon;j.lennon#gmail.com;0704910874;3;29;0;0
The first element is the name, second is surname, third is email address, 4rth is phone number, 5th is division number, 6th is points scored, the remaining two don't matter for this issue.
What I have been asked to do is to check the top two scorers (6ths element) and the bottom two scorers in a division (5th element). When they have been identified, the top two should be promoted (go up a division), the bottom 2 should be demoted (go down a division). I have written this piece of crap:
def rollDivision2():
lst = [line.strip().split(';') for line in open('players.txt','r').readlines()] # creates nested list
totalPoints = []
for i in range(len(lst)):
if lst[i][4] == "2": # checking divisions this is why i need 6 different function
totalPoints.append(int(lst[i][5])) # creates lists from all scores for the division chosen
#----------------------------------------------- figuring out best two scores and writing
maxPoints = max(totalPoints)
for person in lst:
if person[4] == "2" and person[5] == str(maxPoints): #this is why i need 6 different function
biggest = person # creating variable with name of person that has the biggest score
biggestStr = biggest[0] + ";" + biggest[1] + ";" + biggest[2] + ";" + biggest[3] + ";" + biggest[4] + ";" + biggest[5] + ";" + biggest[6] + ";" + biggest[7] + "\n" #puts that lists into a string
break #personWithMostPoints is the whole line of player with most points
secondMaxPoints = secondLargest(totalPoints) #this is why i need 6 different function
for person in lst:
if person[4] == "2" and person[5] == str(secondMaxPoints): #checking for most points in the division
secondBiggest = person
secondBiggestStr = secondBiggest[0] + ";" + secondBiggest[1] + ";" + secondBiggest[2] + ";" + secondBiggest[3] + ";" + secondBiggest[4] + ";" + secondBiggest[5] + ";" + secondBiggest[6] + ";" + secondBiggest[7] + "\n"
break
lineToWriteBiggest = biggest[0] + ";" + biggest[1] + ";" + biggest[2] + ";" + biggest[3] + ";" + "1" + ";" + biggest[5] + ";" + biggest[6] + ";" + biggest[7] + "\n"
lineToWriteSecondBiggest = secondBiggest[0] + ";" + secondBiggest[1] + ";" + secondBiggest[2] + ";" + secondBiggest[3] + ";" + "1" + ";" + secondBiggest[5] + ";" + secondBiggest[6] + ";" + secondBiggest[7] + "\n"
#----------------------------------------------- figuring out best two scores
#----------------------------------------------- figuring out worst two scores
minPoints = min(totalPoints)
for person in lst:
if person[4] == "2" and person[5] == str(minPoints): #this is why i need 6 different function
least = person
leastStr = least[0] + ";" + least[1] + ";" + least[2] + ";" + least[3] + ";" + least[4] + ";" + least[5] + ";" + least[6] + ";" + least[7] + "\n"
break #personWithMostPoints is the whole line of player with most points
secondLeastPoints = secondSmallest(totalPoints) #method defined in utility functions
for person in lst:
if person[4] == "2" and person[5] == str(secondLeastPoints): #this is why i need 6 different function
secondLeast = person
secondLeastStr = secondLeast[0] + ";" + secondLeast[1] + ";" + secondLeast[2] + ";" + secondLeast[3] + ";" + secondLeast[4] + ";" + secondLeast[5] + ";" + secondLeast[6] + ";" + secondLeast[7] + "\n"
break
lineToWriteLeast = least[0] + ";" + least[1] + ";" + least[2] + ";" + least[3] + ";" + "3" + ";" + least[5] + ";" + least[6] + ";" + least[7] + "\n"
lineToWriteSecondLeast = secondLeast[0] + ";" + secondLeast[1] + ";" + secondLeast[2] + ";" + secondLeast[3] + ";" + "3" + ";" + secondLeast[5] + ";" + secondLeast[6] + ";" + secondLeast[7] + "\n"
f = open("players.txt","a")
f.write(lineToWriteBiggest)
f.write(lineToWriteSecondBiggest)
f.write(lineToWriteLeast)
f.write(lineToWriteSecondLeast)
f.close()
f = open("players.txt",'r') # Input file
t = open("temp.txt", 'w') #Temp output file
for line in f:
if line != biggestStr and line != secondBiggestStr and line != leastStr and line != secondLeastStr and line != "\n":
t.write(line) #writes all lines apart from the original line (one that needs to be deleted)
f.close()
t.close()
os.remove("players.txt") #deletes players
os.rename('temp.txt', 'players.txt') #new file with modified info is renamed to players
It is very ugly and impractical, moreover, I have to have 6 of these functions (as there are 6 divisions) which makes the program ridiculously overweight.
If anyone could help me, whether it be how I could use this function just once to check all 6 division instead of having to write 6 individual ones.
I'm sorry you had to see this, I just don't know what else to do at this point.
Any help would be very appreciated.
Many thanks
lines = list(open("some.txt"))
for division,items in itertools.groupby(lines,lambda line:int(line.split(";")[4])):
sorted_people = sorted(items,key=lambda item:int(item.split(";")[-1])
print "DIVISION:",division
print "TOP TWO:",sorted_people[-2:]
print "BOTTOM TWO:",sorted_people[:2]
Check this out -
def convert(arr):
return [int(x) if x.isnumeric() else x for x in arr]
def find_top_two(division):
cur_div = [x for x in arr if x[4] == division]
cur_div.sort(key = lambda T : T[5], reverse = True)
return cur_div[:2]
def find_bottom_two(division):
cur_div = [x for x in arr if x[4] == division]
cur_div.sort(key = lambda T : T[5])
return cur_div[:2]
with open('players.txt', 'r') as f:
arr = f.readlines()
arr = [convert(s.strip('\n').split(';')) for s in arr]
print('Top 2 of division 2 :\n', find_top_two(2))
print('Bottom 2 of division 2 :\n', find_bottom_two(2))
Output -
Top 2 of division 2 :
[['Edward', 'Costello', 'e.costello#gmail.com', 712474588, 2, 45, 0, 0], ['Iain', 'Wilkins', 'i.wilkins#gmail.com', 715497211, 2, 23, 0, 0]]
Bottom 2 of division 2 :
[['Liam', 'Blair', 'l.blair#gmail.com', 729857614, 2, 4, 0, 0], ['Frank', 'Jones', 'f.jones#gmail.com', 719487516, 2, 10, 0, 0]]
Not the best and most efficient code, but this should work. I'll leave you to fill in the blanks.
def getPlayersOfDivision(playerInfo, division):
result = []
for player in playerInfo:
if int(player[4]) == division:
result.append(player)
def findPlayer(playerInfo, firstName, lastName):
for i,player in enuemrate(playerInfo):
if player[0] == firstName and player[1] = lastName:
return i
def get2BestPlayers(playerInfo):
pass
def get2WorstPlayers(playerInfo):
pass
def changePlayerDivision(playerInfo, pos, upOrDown):
pass
with open('players.txt', 'r') as f:
playerInfo = f.readlines()
playerInfo = [line.strip().split(';') for line in playerInfo]
numDivisions = ???
for i in range(numDivisions):
divisionPlayers = getPlayersOfDivision(playerInfo, i)
bestPlayers = get2BestPlayers(divisionPlayers)
worstPlayers = get2WorstPlayers(divisionPlayers)
newPlayerInfo = changePlayerDivision(playerInfo, findPlayer(playerInfo, divisionPlayer[bestPlayers[0]][0], divisionPlayer[bestPlayers[0]][1]), +1)
newPlayerInfo = changePlayerDivision(playerInfo, findPlayer(playerInfo, divisionPlayer[bestPlayers[1]][0], divisionPlayer[bestPlayers[1]][1]), +1)
newPlayerInfo = changePlayerDivision(playerInfo, findPlayer(playerInfo, divisionPlayer[worstPlayers[0]][0], divisionPlayer[worstPlayers[0]][1]), -1)
newPlayerInfo = changePlayerDivision(playerInfo, findPlayer(playerInfo, divisionPlayer[worstPlayers[1]][0], divisionPlayer[worstPlayers[1]][1]), -1)

Categories