I have this kind of code, which check for value in column A. If condition is met then the code check for value in the other column of the same row and copy the value from that column to replace value in column A:
counter = 0
list_of_winners = []
for each in data.iterrows():
winner = data.iloc[counter, 5]
if winner == 'Red':
vitazr = data.iloc[counter, 0]
list_of_winners.append(vitazr)
elif winner == 'Blue':
vitazb = data.iloc[counter, 1]
list_of_winners.append(vitazb)
elif winner == 'Draw':
draw = str('Draw')
list_of_winners.append(draw)
else:
pass
counter += 1
The solution works for me and I am able to create a list and then that list put into original Dataframe and replace the values I looped thru.
What I want to ask.... Isn t there some other more elegant and shorter way to attack/address this problem?
You can do an np.select:
list_of_winners = np.select([data.iloc[:,5] == 'Red',
data.iloc[:,5] == 'Blue',
data.iloc[:,5] == 'Draw'],
[data.iloc[:,0], data.iloc[:, 1], 'Draw',
default=None
)
Related
I am running a for loop in order to create a dataframe of 'New' values.
New = 0
Approved = 0
df = pd.DataFrame()
for row, rowdata in enumerate(combined):
for col, value in enumerate(rowdata.values()):
if col == 0:
print(value)
if col == 2:
New += value
print('Original New')
print(value)
if col == 4:
Approved = value
if Approved > 0:
New = New - Approved
print('Updated New')
print(New)
df['New'] = New
Everything in this code seems to be working except for the last df['New'] = New statement. Any ideas on why that might be happening would be greatly appreciate.
df['New'] = New is a wrong way to insert a single row.
One way to fix it:
all_rows = []
New = 0
Approved = 0
for row, rowdata in enumerate(combined):
for col, value in enumerate(rowdata.values()):
if col == 0:
print(value)
if col == 2:
New += value
print('Original New')
print(value)
if col == 4:
Approved = value
if Approved > 0:
New = New - Approved
print('Updated New')
print(New)
# Accumulate all the rows
all_rows.append(New)
# Finally create a dataframe
df = pd.DataFrame({'New': all_rows})
I have an excel file containing 3 columns(source,destination and time) and 140400 rows, I want to count rows with the same soure, destination and time value) similar values in all columns, by this I mean to count the rows containing packet information from the same sources to the same destination and at the same time.(row1:0,1,3 and row102:0,1,3 so we have 2 same rows here), all the values are integer. I tried to use df.iloc but just returns zero, tried to use dictionary but couldnt make it. I would appreciate if someone help me to find a solution.
for t in timestamps:
this is one way I tried but didn't work.
for x in range(120):
for y in range(120):
while i < 140400 and df.iloc[i,0] <= t:
#if df.iloc[i,0]<= t :
if df.iloc[i, 0] == t and df.iloc[i, 1]==y and df.iloc[i, 2]==x:
TotalArp[x][y]+=1
i=i+1
this is the file format
If I understood correctly, you just want to count rows that all have the same value, right? This should work, despite not being the most efficient way probably:
counter = 0
for index, row in df.iterrows():
if row[0] == row[1] == row[2]:
counter += 1
Edit:
OK, since I'm too stupid to comment, I'll just edit it here:
duplicate_count_df = df.groupby(df.columns.tolist(), as_index=False).size().drop_duplicates(subset=list(df.columns)
This should lead you into the right direction.
Suppose you have These Columns in Your DataFrame:
["Col1" , "Col2" , "Col3" , "Col4"]
Now you want to count rows that contains equal values in each column of your DataFrame:
len(df[df['Col1'] == df['Col2'] == df['Col3'] == df['Col4'])
Just easy Like That.
Update:
if you would like to get the count by each element specifically :
# Create Dictionary to specify count for each element
Properties = {k:0 for k in set([item for elem in df.columns for item in df[elem]])}
# Then Start counting values that are equal in each row
for item in range(len(df)) :
if df.iloc[item , 0] == df.iloc[item , 1] == df.iloc[item , 2] == df.iloc[item , 3]:
Properties[df.iloc[item , 0]] += 1
print(Properties)
Let's See an Example:
# Here i have a DataFrame with 2 columns and 3 rows
df = pd.DataFrame({'1':[1,2,3] , '2':[1,1,'-']})
df
OutPut :
And Then :
Properties = {k:0 for k in set([item for elem in df.columns for item in df[elem]])}
for item in range(len(df)) :
if df.iloc[item , 0] == df.iloc[item , 1]:
Properties[df.iloc[item , 0]] += 1
Properties
Output:
{1: 1, 2: 0, 3: 0, '-': 0}
to better understand:
I have a list of zeros and ones.
I am trying to replace the a value of 1 with a 0 if the previous value is also a 1 for a desired output as shown below.
list = [1,1,1,0,0,0,1,0,1,1,0]
new_list = [1,0,0,0,0,0,1,0,1,0,0]
I've tried using a for loop to no avail. Any suggestions?
How about this for loop:
list = [1,1,1,0,0,0,1,0,1,1,0]
new_list = []
ant=0
for i in list:
if ant ==0 and i==1:
new_list.append(1)
else:
new_list.append(0)
ant=i
question_list = [1,1,1,0,0,0,1,0,1,1,0]
new_list = [question_list[0]] # notice we put the first element here
for i in range(1, len(question_list) + 1):
# check if the current and previous element are 1
if question_list[i] == 1 and question_list[i - 1] == 1:
new_list.append(0)
else:
new_list.append(question_list[i])
The idea here is we iterate over the list, while checking the previous element.
I am trying to create some lag features by subtracting a month from each date in my datetime column and then assigning a column value from the past date to the current one.
This is my code:
for row_index in range(0,len(merger)):
date = merger.loc[merger.index[row_index],'datetime']
prev = subtract_one_month(date)
inde = merger.loc[merger['datetime'] == str(prev),'count'].index.values.astype(int)
if inde == []:
continue
else:
inde = inde[0]
merger.loc[merger.index[row_index], 'count_lag_month'] =
merger.loc[merger.index[inde], 'count']
The inner if else loop is meant to deal with cases where the date I'm looking for doesn't exist.
The code above simply gives me a list of NaNs. I would appreciate any help.
I've changed my
first = []
mean = []
wrkday = []
count = []
for row_index in range(0,len(merger)):
print(row_index)
date = merger.loc[merger.index[row_index],'datetime']
prev = subtract_one_month(date)
inde = merger.loc[merger['datetime'] == str(prev)].index.values.astype(int)
if inde.size == 0:
first.append(0)
mean.append(0)
wrkday.append(0)
continue
else:
inde = inde[0]
first.append(merger.loc[merger.index[inde], 'count'])
mean.append(merger.loc[merger.index[inde], 'monthly_mean_count'])
wrkday.append(merger.loc[merger.index[inde], 'monthly_wrkday_mean_count'])
prev_day = subtract_one_day(date)
inde = merger.loc[merger['datetime'] == str(prev_day)].index.values.astype(int)
if inde.size == 0:
count.append(0)
continue
else:
inde = inde[0]
count.append(merger.loc[merger.index[inde], 'count'])
merger['count_lag_month'] = first
merger['monthly_mean_count_lag_month'] = mean
merger['monthly_wrkday_mean_count_lag_month'] = wrkday
merger['count_lag_day'] = count
It uses lists instead and it seems to run at a decent speed. I'm not sure if it's the best approach though.
So here is my code updating many column values based on a condition of split values of the column 'location'. The code works fine, but as its iterating by row it's not efficient enough. Can anyone help me to make this code work faster please?
for index, row in df.iterrows():
print index
location_split =row['location'].split(':')
after_county=False
after_province=False
for l in location_split:
if l.strip().endswith('ED'):
df[index, 'electoral_district'] = l
elif l.strip().startswith('County'):
df[index, 'county'] = l
after_county = True
elif after_province ==True:
if l.strip()!='Ireland':
df[index, 'dublin_postal_district'] = l
elif after_county==True:
df[index, 'province'] = l.strip()
after_province = True
'map' was what I needed :)
def fill_county(column):
res = ''
location_split = column.split(':')
for l in location_split:
if l.strip().startswith('County'):
res= l.strip()
break
return res
df['county'] = map(fill_county, df['location'])