I am writing a python code where I have a condition which till the time it is true I want the calculations to happen and update the dataframe columns. However I am noticing that the dataframe is not getting updated and all the values are of the 1st iteration only. Can an expert guide on where I am going wrong. Below is my sample code -
'''
mbd_out_ub2 = mbd_out_ub1
mbd_out_ub2_len = len(mbd_out_ub2)
plt_mbd_c1_all = pd.DataFrame()
brd2c2_all = pd.DataFrame()
iterc=1
### plt_mbd_c >> this is the data frame with data before the loop starts
plt_mbd_c0 = plt_mbd_c.copy()
plt_mbd_c0 = plt_mbd_c0[plt_mbd_c0['UB_OUT']==1]
while (iterc < 10):
plt_mbd_c1 = plt_mbd_c0.copy()
brd2c2 = plt_mbd_c1.groupby('KEY1')['NEST_VAL_PER'].agg([('KEY1_CNT','count'),('PER1c', lambda x: x.quantile(0.75))]).reset_index()
brd2c2_all = brd2c2_all.append(brd2c2).reset_index(drop=True)
plt_mbd_c1 = pd.merge(plt_mbd_c1,brd2c2[['KEY1','PER1c']],on='KEY1', how='left')
del brd2c2, plt_mbd_c0
plt_mbd_c1['NEST_VAL_PER1'] = plt_mbd_c1['PER1c'] * (plt_mbd_c1['EVAL_LP_%'] / 100)
plt_mbd_c1['NEST_VAL_PER1'] = np.where((plt_mbd_c1['BRD_OUT_FLAG'] == 0),plt_mbd_c1['NEST_VAL'],plt_mbd_c1['NEST_VAL_PER1'] )
plt_mbd_c1['SALESC'] = plt_mbd_c1['NEST_VAL_PER1']/plt_mbd_c1['PROJR']/plt_mbd_c1['NEWPRICE']
plt_mbd_c1['C_SALES_C'] = np.where(plt_mbd_c1['OUT_FLAG'] == 1,plt_mbd_c1['SALESC'],plt_mbd_c1['SALESUNIT'])
plt_mbd_c1['NEST_VAL_PER'] = plt_mbd_c1['C_SALES_C'] * plt_mbd_c1['PROJR'] * plt_mbd_c1['NEWPRICE']
plt_mbd_c1['ITER'] = iterc
plt_mbd_c1_all = plt_mbd_c1_all.append(plt_mbd_c1).reset_index(drop=True)
plt_mbd_c1.drop(['PER1c'],axis=1,inplace=True)
plt_mbd_c0 = plt_mbd_c1.copy()
del plt_mbd_c1
print("iter = ",iterc)
iterc = iterc + 1
'''
So above I want to take 75th percentile of a column by KEY1 and do few calculations. The idea is after every iteration my 75th percentile will keep reducing as I am updating the same column with calculated value which would be lower then the current value (since it is based on 75th percentile). However when I check I find for all the iterations the values are same as the 1st iteration only. I have tried to delete the data frames, save to temp data frame, copy dataframe but non seem to be working.
Please help !!
Here is the code I am working with.
dfs=dfs[['Reserved']] #the column that I need to insert
dfs=dfs.applymap(str) #json did not accept the nan so needed to convert
sh=gc.open_by_key('KEY') #would open the google sheet
sh_dfs=sh.get_worksheet(0) #getting the worksheet
sh_dfs.insert_rows(dfs.values.tolist()) #inserts the dfs into the new worksheet
Running this code would insert the rows at the first column of the worksheet but what I am trying to accomplish is adding/inserting the column at the very last, column p.
In your situation, how about the following modification? In this modification, at first, the maximum column is retrieved. And, the column number is converted to the column letter, and the values are put to the next column of the last column.
From:
sh_dfs.insert_rows(dfs.values.tolist())
To:
# Ref: https://stackoverflow.com/a/23862195
def colnum_string(n):
string = ""
while n > 0:
n, remainder = divmod(n - 1, 26)
string = chr(65 + remainder) + string
return string
values = sh_dfs.get_all_values()
col = colnum_string(max([len(r) for r in values]) + 1)
sh_dfs.update(col + '1', dfs.values.tolist(), value_input_option='USER_ENTERED')
Note:
If an error like exceeds grid limits occurs, please insert the blank column.
Reference:
update
I'm trying to automate googlesheets through python, and every time my DF query runs, it inserts the data with the current day.
To put it simple, when a date column is empty, it have to be fulfilled with date when the program runs. The image is:
EXAMPLE IMAGE
I was trying to do something like it:
ws = client.open("automation").worksheet('sheet2')
ws.update(df_h.fillna('0').columns.values.tolist())
I'm not able to fulfill just the empty space, seems that or all the column is replaced, or all rows, etc.
Solved it thorugh another account:
ws_date_pipe = client.open("automation").worksheet('sheet2')
# Range of date column (targeted one, which is the min range)
next_row_min = str(len(list(filter(None, ws_date_pipe.col_values(8))))+1)
# Range of first column (which is the max range)
next_row_max = str(len(list(filter(None, ws_date_pipe.col_values(1)))))
cell_list = ws_date_pipe.range(f"H{next_row_min}:H{next_row_max}")
cell_values = []
# Difference between max-min ranges, space that needs to be fulfilled
for x in range(0, ((int(next_row_max)+1)-int(next_row_min)), 1):
iterator = x
iterator = datetime.datetime.now().strftime("%Y-%m-%d")
iterator = str(iterator)
cell_values.append(iterator)
for i, val in enumerate(cell_values):
cell_list[i].value = val
# If date range len "next_row_min" is lower than the first column, then fill.
if int(next_row_min) < int(next_row_max)+1:
ws_date_pipe.update_cells(cell_list)
print(f'Saved to csv file. {datetime.datetime.now().strftime("%Y-%m-%d")}')
I'm new to Python, and attempting to automate a report at my workplace to save time, space, and trouble. The report runs just fine, and almost all of my code to write the results into an Excel document work as expected as well. However, these two formats:
percent = wb.add_format({'num_format': '0.0%','border':1,'border_color':'white'})
integer = wb.add_format({'num_format': '#,##0','border':1,'border_color':'white'})
are behaving oddly. When I run this:
i = 10
for lob in report.index.get_level_values(1).unique():
if report.loc[(program,lob)].sum().sum()==0:
pass
else:
place=report.loc[(program,lob)]
r=0
for year in place.index:
for item in range(8):
ws.write(i+r,item+2,place.loc[year][item],integer)
for item in range(9):
ws.write(i+r,item+10,place.loc[year][item+8],percent)
r+=1
for col_num, value in enumerate(report.columns.values):
ws.write(i-1, col_num + 2, value, headers)
ws.write(i-1,1,lob,lobtitle)
for row_num, year in enumerate(report.index.get_level_values(2).unique()):
ws.write(i+row_num,1,year,bold)
ws.set_row(i-1,40)
ws.set_row(i+7,None,bold)
i+=10
The first eight stats write in my "integer" format with white borders, but the next nine in the row write in percent format for the number, but with no border formatting at all (leaving the default Excel lines). In fact, throughout the report, anything I write with the "integer" format works out, and anything written with the "percent" format gives the correct number format without the border format:
The apparent simplicity of this issue is driving me crazy. Thanks for any help you can provide.
For reference, here's the full code. 'report' is a multi index dataframe with company programs as level 0, line of business (lob) as level 1, and the years 2015-2020 as level 2.
#Establish common formats
wb=xl.Workbook('Report.xlsx')
title=wb.add_format({'font_size':16,'font_name':'Calibri','align':'center','border':1,'border_color':'white'})
subtitle=wb.add_format({'font_size':14,'font_name':'Calibri','align':'center','border':1,'border_color':'white'})
blank=wb.add_format({'bg_color':'white'})
black=wb.add_format({'bg_color':'black'})
bold=wb.add_format({'bold':True,'border_color':'white'})
lobtitle=wb.add_format({'bold':True,'italic':True,'font_size':14})
wrap=wb.add_format({'text_wrap':True})
headers=wb.add_format({'bold':True,'text_wrap':True,'bg_color':'#DCDCDC','align':'center'})
percent = wb.add_format({'num_format': '0.0%','border':1,'border_color':'white'})
integer = wb.add_format({'num_format': '#,##0','border':1,'border_color':'white'})
shadepercent = wb.add_format({'num_format': '0.0%','border':1,'border_color':'white','bg_color':'#DCDCDC'})
shadeinteger = wb.add_format({'num_format': '#,##0','border':1,'border_color':'white','bg_color':'#DCDCDC'})
shadebold=wb.add_format({'bold':True,'border_color':'white','bg_color':'#DCDCDC'})
gridinteger=wb.add_format({'num_format': '#,##0','border':1,'border_color':'gray'})
gridpercent=percent = wb.add_format({'num_format': '0.0%','border':1,'border_color':'gray'})
#For every program, blank out all cells and add company title.
for program in report.index.get_level_values(0).unique():
ws=wb.add_worksheet(program)
for j in range(100):
ws.set_row(j,None,blank)
ws.set_column('B:S',15)
ws.write(0,10,'Company Title',title)
ws.write(1,10,'Report Name',subtitle)
ws.write(2,10,'as of {}'.format(effective),subtitle)
ws.write(3,10,'Detail',subtitle)
#Check each lob within a program for nonzero values. For each nonzero lob, write the lob's stats.
#Write the nonzero lob and its policy years from the index, and drop ten rows for the next entry.
i = 10
for lob in report.index.get_level_values(1).unique():
if report.loc[(program,lob)].sum().sum()==0:
pass
else:
place=report.loc[(program,lob)]
r=0
for year in place.index:
for item in range(8):
ws.write(i+r,item+2,place.loc[year][item],integer)
for item in range(9):
ws.write(i+r,item+10,place.loc[year][item+8],percent)
r+=1
for col_num, value in enumerate(report.columns.values):
ws.write(i-1, col_num + 2, value, headers)
ws.write(i-1,1,lob,lobtitle)
for row_num, year in enumerate(report.index.get_level_values(2).unique()):
ws.write(i+row_num,1,year,bold)
ws.set_row(i-1,40)
ws.set_row(i+7,None,bold)
i+=10
The sample code isn't complete enough to say what the issue is but there shouldn't be any issue with the formats as this example shows:
import xlsxwriter
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
data1 = [1000, 1001, 1002]
data2 = [.35, .50, .75]
percent = workbook.add_format({'num_format': '0.0%', 'border': 1, 'border_color': 'white'})
integer = workbook.add_format({'num_format': '#,##0', 'border': 1, 'border_color': 'white'})
worksheet.write_column(2, 2, data1, integer)
worksheet.write_column(2, 4, data2, percent)
workbook.close()
Output:
As a guess the program may be overwriting the percent cells with another format but it isn't possible to tell without a complete working example.
Update, based on the update code from the OP:
From your updated code it looks like you are overwriting the percent format here:
gridpercent = percent = wb.add_format(...)
This resets the percent format.
I have a xls file and the first column consist of many rows for example
MN
TN
RMON
BNE
RMGS
HUDGD
YINT
Then I want to pass each cell (the value of it) to a function
mystruc1 = make_structure("MN")
mystruc2 = make_structure("TN")
mystruc3 = make_structure("RMON")
mystruc4 = make_structure("BNE")
mystruc5 = make_structure("RMGS")
mystruc6 = make_structure("HUDGD")
mystruc7 = make_structure("YINT")
So each time the value of one cell will go to the function
Then I want to pass the output of it to another function
out = Bio.PDB.PDBIO()
out.set_structure(mystruc1)
out.save( "MN001.pdb" )
out.set_structure(mystruc2)
out.save( "MN002.pdb" )
out.set_structure(mystruc3)
out.save( "MN003.pdb" )
out.set_structure(mystruc4)
out.save( "MN004.pdb" )
out.set_structure(mystruc5)
out.save( "MN005.pdb" )
out.set_structure(mystruc6)
out.save( "MN006.pdb" )
out.set_structure(mystruc7)
out.save( "MN007.pdb" )
this is how if i do it manually. I want to avoid doing it manually
You can construct the filename using str.format, Format String Syntax
>>> filename = '{}{:04}.pdb'
>>> filename.format('MN', 1)
'MN0001.pdb'
>>> filename.format('MN', 352)
'MN0352.pdb'
>>>
You can use enumerate while iterating over the sheet's rows to help construct the filename.
import xlrd
filename = '{}{:04}.pdb'
workbook = xlrd.open_workbook('test.xls')
for sheet in workbook.sheets():
for n, row in enumerate(sheet.get_rows()):
col_0 = row[0].value
print filename.format(col_0, n)
If you only want to iterate over the first column.
for sheet in workbook.sheets():
for n, value in enumerate(sheet.col_values(0, start_rowx=0, end_rowx=None)):
print filename.format(value, n)
Or you can access the cel values directly.
for sheet in workbook.sheets():
for i in xrange(sheet.nrows):
rowi_col0 = sheet.cell_value(i, 0)
print filename.format(rowi_col0, i)
Once you have extracted a cel's value you can pass it to any function/method - similar to passing the cel value to the str.format method.
mystruc = make_structure(value)
To automate processing the cel values, add your process to the loop.
for sheet in workbook.sheets():
for i in xrange(sheet.nrows):
rowi_col0 = sheet.cell_value(i, 0)
#print filename.format(col_0, i)
my_structure = make_structure(rowi_col0)
out = Bio.PDB.PDBIO()
out.set_structure(my_structure)
out.save(filename.format(rowi_col0, i))
I don't have comment privileges to ask for clarification, so I'm going to answer this best I can and hopefully you can clarify if I'm going in the wrong direction.
From what you wrote, I'm assuming that you have some column, 'MN' and you want to name a bunch of files starting from 'MN001.pdb' all the way to 'MN0xx.pdb' (where xx is the last row you're working with.
One way you can achieve this is by working with a loop that has a counter and iterates with each iteration of your second for loop.
colname = "MN"
for sheet in workbook.sheets():
counter = 0
for row in range(sheet.nrows):
# pass your code here
counter += 1
s_counter = str(counter)
s = ''
if len(s_counter) < 2:
s = '0' + s
elif len(s_counter) < 3:
s = '00' + s
...
out.save(s + '.pdb')