Python while loop not updating the DataFrame column calculations - python

I am writing a python code where I have a condition which till the time it is true I want the calculations to happen and update the dataframe columns. However I am noticing that the dataframe is not getting updated and all the values are of the 1st iteration only. Can an expert guide on where I am going wrong. Below is my sample code -
'''
mbd_out_ub2 = mbd_out_ub1
mbd_out_ub2_len = len(mbd_out_ub2)
plt_mbd_c1_all = pd.DataFrame()
brd2c2_all = pd.DataFrame()
iterc=1
### plt_mbd_c >> this is the data frame with data before the loop starts
plt_mbd_c0 = plt_mbd_c.copy()
plt_mbd_c0 = plt_mbd_c0[plt_mbd_c0['UB_OUT']==1]
while (iterc < 10):
plt_mbd_c1 = plt_mbd_c0.copy()
brd2c2 = plt_mbd_c1.groupby('KEY1')['NEST_VAL_PER'].agg([('KEY1_CNT','count'),('PER1c', lambda x: x.quantile(0.75))]).reset_index()
brd2c2_all = brd2c2_all.append(brd2c2).reset_index(drop=True)
plt_mbd_c1 = pd.merge(plt_mbd_c1,brd2c2[['KEY1','PER1c']],on='KEY1', how='left')
del brd2c2, plt_mbd_c0
plt_mbd_c1['NEST_VAL_PER1'] = plt_mbd_c1['PER1c'] * (plt_mbd_c1['EVAL_LP_%'] / 100)
plt_mbd_c1['NEST_VAL_PER1'] = np.where((plt_mbd_c1['BRD_OUT_FLAG'] == 0),plt_mbd_c1['NEST_VAL'],plt_mbd_c1['NEST_VAL_PER1'] )
plt_mbd_c1['SALESC'] = plt_mbd_c1['NEST_VAL_PER1']/plt_mbd_c1['PROJR']/plt_mbd_c1['NEWPRICE']
plt_mbd_c1['C_SALES_C'] = np.where(plt_mbd_c1['OUT_FLAG'] == 1,plt_mbd_c1['SALESC'],plt_mbd_c1['SALESUNIT'])
plt_mbd_c1['NEST_VAL_PER'] = plt_mbd_c1['C_SALES_C'] * plt_mbd_c1['PROJR'] * plt_mbd_c1['NEWPRICE']
plt_mbd_c1['ITER'] = iterc
plt_mbd_c1_all = plt_mbd_c1_all.append(plt_mbd_c1).reset_index(drop=True)
plt_mbd_c1.drop(['PER1c'],axis=1,inplace=True)
plt_mbd_c0 = plt_mbd_c1.copy()
del plt_mbd_c1
print("iter = ",iterc)
iterc = iterc + 1
'''
So above I want to take 75th percentile of a column by KEY1 and do few calculations. The idea is after every iteration my 75th percentile will keep reducing as I am updating the same column with calculated value which would be lower then the current value (since it is based on 75th percentile). However when I check I find for all the iterations the values are same as the 1st iteration only. I have tried to delete the data frames, save to temp data frame, copy dataframe but non seem to be working.
Please help !!

Related

Creating a Dictionary and then a dataframe from different Types

I'm having a problem manipulating different types. Here is what I'm doing:
--
row_list=[]
for x in a:
df=selectinfo(x)
analysis=df[['Sales']].copy()
decompose_result_mult = seasonal_decompose(analysis, model="additive")
date = decompose_result_mult.trend.to_frame()
date.reset_index(inplace=True)
date=date.iloc[:,0]
trend = decompose_result_mult.trend
trend = decompose_result_mult.trend.to_frame()
trend.reset_index(inplace=True)
trend=trend.iloc[:,1]
seasonal = decompose_result_mult.seasonal
seasonal = decompose_result_mult.seasonal.to_frame()
seasonal.reset_index(inplace=True)
seasonal=seasonal.iloc[:,1]
residual = decompose_result_mult.resid
residual = decompose_result_mult.resid.to_frame()
residual.reset_index(inplace=True)
residual=residual.iloc[:,1]
observed=decompose_result_mult.observed
observed=decompose_result_mult.observed.to_frame()
observed.reset_index(inplace=True)
observed=observed.iloc[:,1]
dict= {'Date':date,'region':x,'Trend':trend,'Seasonal':seasonal,'Residual':residual,'Sales':observed}
row_list.append(dict)
pandasTable=pd.DataFrame(row_list)
The problem with this is that when I run my code I get the next result:
It is a dataframe that have inside Series... Any help? I would like to have 1 value per row and not 1 list per row.
Thank you!

Python DataFrame repeating column dynamically with calculation

I have a function that will do some calculations made in other parts of the code. I have it dynamically repeat the calculations based on the number of input (binary) a user indicates in an earlier section. The calculations are working but I cannot get this to create multiple columns in the dataframe. How can I get the function to create a new column for each repeat of the "while" loop? Is that even possible? I tried creating 9 columns to see if that would work but I get a "mismatched columns" error.
def multipleldlc():
outputldlc = pd.DataFrame(columns=["ldldcisfhprob1", "ldldcisfhprob2", "ldldcisfhprob3", "ldldcisfhprob4", "ldldcisfhprob5", "ldldcisfhprob6", "ldldcisfhprob7", "ldldcisfhprob8", "ldldcisfhprob9"])
y = 0
numcols = len(binary[0])
print(numcols)
while y < numcols:
for item in binary:
for i, j in pinput.iterrows():
agej =j.values[0]
ldlcj = j.values[1]
totalcj = j.values[2]
ldlcisfh = ldlcfh(agej, ldlcj, totalcj)
ldlcisnotfh = ldlcfh(agej, ldlcj, totalcj)
if item[y] == 0:
ldlcprob = ldlcisnotfh
outputldlc.loc[i] = [ldlcprob]
else:
ldlcprob = ldlcisfh
outputldlc.loc[i] = [ldlcprob]
y += 1
print(outputldlc)
return outputldlc

How to insert a value in google sheets through python just when the cell is empty/null

I'm trying to automate googlesheets through python, and every time my DF query runs, it inserts the data with the current day.
To put it simple, when a date column is empty, it have to be fulfilled with date when the program runs. The image is:
EXAMPLE IMAGE
I was trying to do something like it:
ws = client.open("automation").worksheet('sheet2')
ws.update(df_h.fillna('0').columns.values.tolist())
I'm not able to fulfill just the empty space, seems that or all the column is replaced, or all rows, etc.
Solved it thorugh another account:
ws_date_pipe = client.open("automation").worksheet('sheet2')
# Range of date column (targeted one, which is the min range)
next_row_min = str(len(list(filter(None, ws_date_pipe.col_values(8))))+1)
# Range of first column (which is the max range)
next_row_max = str(len(list(filter(None, ws_date_pipe.col_values(1)))))
cell_list = ws_date_pipe.range(f"H{next_row_min}:H{next_row_max}")
cell_values = []
# Difference between max-min ranges, space that needs to be fulfilled
for x in range(0, ((int(next_row_max)+1)-int(next_row_min)), 1):
iterator = x
iterator = datetime.datetime.now().strftime("%Y-%m-%d")
iterator = str(iterator)
cell_values.append(iterator)
for i, val in enumerate(cell_values):
cell_list[i].value = val
# If date range len "next_row_min" is lower than the first column, then fill.
if int(next_row_min) < int(next_row_max)+1:
ws_date_pipe.update_cells(cell_list)
print(f'Saved to csv file. {datetime.datetime.now().strftime("%Y-%m-%d")}')

Randomization of a list with conditions using Pandas

I'm new to any kind of programming as you can tell by this 'beautiful' piece of hard coding. With sweat and tears (not so bad, just a little), I've created a very sequential code and that's actually my problem. My goal is to create a somewhat-automated script - probably including for-loop (I've unsuccessfully tried).
The main aim is to create a randomization loop which takes original dataset looking like this:
dataset
From this data set picking randomly row by row and saving it one by one to another excel list. The point is that the row from columns called position01 and position02 should be always selected so it does not match with the previous pick in either of those two column values. That should eventually create an excel sheet with randomized rows that are followed always by a row that does not include values from the previous pick. So row02 should not include any of those values in columns position01 and position02 of the row01, row3 should not contain values of the row2, etc. It should also iterate in the range of the list length, which is 0-11. Important is also the excel output since I need the rest of the columns, I just need to shuffle the order.
I hope my aim and description are clear enough, if not, happy to answer any questions. I would appreciate any hint or help, that helps me 'unstuck'. Thank you. Code below. (PS: I'm aware of the fact that there is probably much more neat solution to it than this)
import pandas as pd
import random
dataset = pd.read_excel("C:\\Users\\ibm\\Documents\\Psychopy\\DataInput_Training01.xlsx")
# original data set use for comparisons
imageDataset = dataset.loc[0:11, :]
# creating empty df for storing rows from imageDataset
emptyExcel = pd.DataFrame()
randomPick = imageDataset.sample() # select randomly one row from imageDataset
emptyExcel = emptyExcel.append(randomPick) # append a row to empty df
randomPickIndex = randomPick.index.tolist() # get index of the row
imageDataset2 = imageDataset.drop(index=randomPickIndex) # delete the row with index selected before
# getting raw values from the row 'position01'/02 are columns headers
randomPickTemp1 = randomPick['position01'].values[0]
randomPickTemp2 = randomPick
randomPickTemp2 = randomPickTemp2['position02'].values[0]
# getting a dataset which not including row values from position01 and position02
isit = imageDataset2[(imageDataset2.position01 != randomPickTemp1) & (imageDataset2.position02 != randomPickTemp1) & (imageDataset2.position01 != randomPickTemp2) & (imageDataset2.position02 != randomPickTemp2)]
# pick another row from dataset not including row selected at the beginning - randomPick
randomPick2 = isit.sample()
# save it in empty df
emptyExcel = emptyExcel.append(randomPick2, sort=False)
# get index of this second row to delete it in next step
randomPick2Index = randomPick2.index.tolist()
# delete the another row
imageDataset3 = imageDataset2.drop(index=randomPick2Index)
# AND REPEAT the procedure of comparison of the raw values with dataset already not including the original row:
randomPickTemp1 = randomPick2['position01'].values[0]
randomPickTemp2 = randomPick2
randomPickTemp2 = randomPickTemp2['position02'].values[0]
isit2 = imageDataset3[(imageDataset3.position01 != randomPickTemp1) & (imageDataset3.position02 != randomPickTemp1) & (imageDataset3.position01 != randomPickTemp2) & (imageDataset3.position02 != randomPickTemp2)]
# AND REPEAT with another pick - save - matching - picking again.. until end of the length of the dataset (which is 0-11)
So at the end I've used a solution provided by David Bridges (post from Sep 19 2019) on psychopy websites. In case anyone is interested, here is a link: https://discourse.psychopy.org/t/how-do-i-make-selective-no-consecutive-trials/9186
I've just adjusted the condition in for loop to my case like this:
remaining = [choices[x] for x in choices if last['position01'] != choices[x]['position01'] and last['position01'] != choices[x]['position02'] and last['position02'] != choices[x]['position01'] and last['position02'] != choices[x]['position02']]
Thank you very much for the helpful answer! and hopefully I did not spam it over here too much.
import itertools as it
import random
import pandas as pd
# list of pair of numbers
tmp1 = [x for x in it.permutations(list(range(6)),2)]
df = pd.DataFrame(tmp1, columns=["position01","position02"])
df1 = pd.DataFrame()
i = random.choice(df.index)
df1 = df1.append(df.loc[i],ignore_index = True)
df = df.drop(index = i)
while not df.empty:
val = list(df1.iloc[-1])
tmp = df[(df["position01"]!=val[0])&(df["position01"]!=val[1])&(df["position02"]!=val[0])&(df["position02"]!=val[1])]
if tmp.empty: #looped for 10000 times, was never empty
print("here")
break
i = random.choice(tmp.index)
df1 = df1.append(df.loc[i],ignore_index = True)
df = df.drop(index=i)

Output unique values from a pandas dataframe without reordering the output

I know that a few posts have been made regarding how to output the unique values of a dataframe without reordering the data.
I have tried many times to implement these methods, however, I believe that the problem relates to how the dataframe in question has been defined.
Basically, I want to look into the dataframe named "C", and output the unique values into a new dataframe named "C1", without changing the order in which they are stored at the moment.
The line that I use currently is:
C1 = pd.DataFrame(np.unique(C))
However, this returns an ascending order list (while, I simply want the list order preserved only with duplicates removed).
Once again, I apologise to the advanced users who will look at my code and shake their heads -- I'm still learning! And, yes, I have tried numerous methods to solve this problem (redefining the C dataframe, converting the output to be a list etc), to no avail unfortunately, so this is my cry for help to the Python gods. I defined both C and C1 as dataframes, as I understand that these are pretty much the best datastructures to house data in, such that they can be recalled and used later, plus it is quite useful to name the columns without affecting the data contained in the dataframe).
Once again, your help would be much appreciated.
F0 = ('08/02/2018','08/02/2018',50)
F1 = ('08/02/2018','09/02/2018',52)
F2 = ('10/02/2018','11/02/2018',46)
F3 = ('12/02/2018','16/02/2018',55)
F4 = ('09/02/2018','28/02/2018',48)
F_mat = [[F0,F1,F2,F3,F4]]
F_test = pd.DataFrame(np.array(F_mat).reshape(5,3),columns=('startdate','enddate','price'))
#convert string dates into DateTime data type
F_test['startdate'] = pd.to_datetime(F_test['startdate'])
F_test['enddate'] = pd.to_datetime(F_test['enddate'])
#convert datetype to be datetime type for columns startdate and enddate
F['startdate'] = pd.to_datetime(F['startdate'])
F['enddate'] = pd.to_datetime(F['enddate'])
#create contract duration column
F['duration'] = (F['enddate'] - F['startdate']).dt.days + 1
#re-order the F matrix by column 'duration', ensure that the bootstrapping
#prioritises the shorter term contracts
F.sort_values(by=['duration'], ascending=[True])
# create prices P
P = pd.DataFrame()
for index, row in F.iterrows():
new_P_row = pd.Series()
for date in pd.date_range(row['startdate'], row['enddate']):
new_P_row[date] = row['price']
P = P.append(new_P_row, ignore_index=True)
P.fillna(0, inplace=True)
#create C matrix, which records the unique day prices across the observation interval
C = pd.DataFrame(np.zeros((1, intNbCalendarDays)))
C.columns = tempDateRange
#create the Repatriation matrix, which records the order in which contracts will be
#stored in the A matrix, which means that once results are generated
#from the linear solver, we know exactly which CalendarDays map to
#which columns in the results array
#this array contains numbers from 1 to NbContracts
R = pd.DataFrame(np.zeros((1, intNbCalendarDays)))
R.columns = tempDateRange
#define a zero filled matrix, P1, which will house the dominant daily prices
P1 = pd.DataFrame(np.zeros((intNbContracts, intNbCalendarDays)))
#rename columns of P1 to be the dates contained in matrix array D
P1.columns = tempDateRange
#create prices in correct rows in P
for i in list(range(0, intNbContracts)):
for j in list(range(0, intNbCalendarDays)):
if (P.iloc[i, j] != 0 and C.iloc[0,j] == 0) :
flUniqueCalendarMarker = P.iloc[i, j]
C.iloc[0,j] = flUniqueCalendarMarker
P1.iloc[i,j] = flUniqueCalendarMarker
R.iloc[0,j] = i
for k in list(range(j+1,intNbCalendarDays)):
if (C.iloc[0,k] == 0 and P.iloc[i,k] != 0):
C.iloc[0,k] = flUniqueCalendarMarker
P1.iloc[i,k] = flUniqueCalendarMarker
R.iloc[0,k] = i
elif (C.iloc[0,j] != 0 and P.iloc[i,j] != 0):
P1.iloc[i,j] = C.iloc[0,j]
#convert C dataframe into C_list, in prepataion for converting C_list
#into a unique, order preserved list
C_list = C.values.tolist()
#create C1 matrix, which records the unique day prices across unique days in the observation period
C1 = pd.DataFrame(np.unique(C))
Use DataFrame.duplicated() to check if your data-frame contains any duplicate or not.
If yes then you can try DataFrame.drop_duplicate() .

Categories