if and for loop in one line - python

I am approaching an excel via openpyxl and I need to do elif statment and for loop at the same line of code.
What I want to achive is this:
Check if the value is not None, if not None do a loop in a column in which you are looking for an index of matched value.
If you do not find the value in this column, check the second column, where the value is. I have to create an elif statment, which would guide machine to do similar thing as it is doing in 'else:' statment which I can handle as I can write it in a chain and multiple lines
The code I have:
for each in sheet['G'][1:]:
indexing_no = int(sheet['G'].index(each)+1)
indexing_column = int(sheet['G'].index(each))
if each.value == None:
pass
else:
for search_value in sheet['A'][1:]:
if each.value == search_value.value:
index_no = int(sheet['A'].index(search_value) + 1)
sheet['H{name}'.format(name = indexing_no)].value = sheet['B{name}'.format(name = index_no)].value

You could try this:
columns = ("G","A","D") # column letters to search in parallel
values = ( ((c,v) for v in sheet[c][:1]) for c in columns )
for row,colVal in enumerate(zip(*values),1)
col,value = next( ((c,v) for c,v in colval if v is not None),("",None))
if not col: continue # or break when no column has any value
# ...
# perform common work on first column that has a non-None value
# using col as the column letter and value for the value of the cell
# at sheet[col][row]

Related

Different ways of iterating through pandas DataFrame

I am currently working on a short pandas project. The project assessment keeps marking this task as incorrect for me even though the resulting list appears to be the same as when the provided correct code is used. Is my code wrong and it just happens to give the same results for this particular DataFrame?
My code:
# Define an empty list
colors = []
# Iterate over rows of netflix_movies_col_subset
for t in netflix_movies_col_subset['genre']:
if t == 'Children' :
colors.append('red')
elif t == 'Documentaries' :
colors.append('blue')
elif t == 'Stand-up' :
colors.append('green')
else:
colors.append('black')
# Inspect the first 10 values in your list
print(colors[:10])
Provided code:
# Define an empty list
colors = []
# Iterate over rows of netflix_movies_col_subset
for lab, row in netflix_movies_col_subset.iterrows():
if row['genre'] == 'Children' :
colors.append('red')
elif row['genre'] == 'Documentaries' :
colors.append('blue')
elif row['genre'] == 'Stand-up' :
colors.append('green')
else:
colors.append('black')
# Inspect the first 10 values in your list
print(colors[0:10])
I've always been told, that the best way to iterate over a dataframe row by row is NOT TO DO IT.
I your case, you could very nicely use df.ne()
First create a dataframe that holds all genres (df_genres)
then use
netflix_movies_col_subset['genre'].ne(df_genres, axis=0)
this should create a dataframe that has a line for every movie and columns for every genre. If a certain movie is a documentary, values in all columns would be False, only in the Documentary column it would be True.
This method is by multiple orders of magnitude faster than iterating with multiple if statements.
Does this help? I haven't test it yet.
# Define an empty list
colors = []
# Iterate over rows of netflix_movies_col_subset
for t in netflix_movies_col_subset['genre']:
if t == 'Children' :
x='red'
elif t == 'Documentaries' :
x= 'blue'
elif t == 'Stand-up' :
x ='green'
else:
x ='black'
colors.append(x)
# Inspect the first 10 values in your list
print(colors[:10])
Or you can do match case.
# Define an empty list
colors = []
# Iterate over rows of netflix_movies_col_subset
for t in netflix_movies_col_subset['genre']:
match t:
case 'Children':
x ='red'
case 'Documentaries':
x ='blue'
case 'Stand-up':
x ='green'
else:
x ='black'
colors.appent(x)
# Inspect the first 10 values in your list
print(colors[:10])

Find list values in a column to delete odd ones out with Openpyxl

I have two workbooks and Im looking to grab both of their column A to compare the cell values to see if there is a discrepancy.
If the column A (in workbook1) != column A(in workbooks2) delete the value in workbook1.
Heres what I have so far
book1_list = []
book2_list = []
tempList = []
column_name = 'Numbers'
skip_Head_of_anotherSheet = anotherSheet[2: anotherSheet.max_row]
skip_Head_of_other = sheets[2: sheets.max_row]
for val1 in skip_Head_of_other:
book1_list.append(val1[0].value)
for val2 in skip_Head_of_anotherSheet:
book2_list.append(val2[0].value)
for i in book1_list:
for j in book2_list:
if i == j:
tempList.append(j)
print(j)
Here is where I get stuck -
for temp in tempList:
for pointValue in skip_Head_of_anotherSheet:
if temp != pointValue[0].value:
anotherSheet.cell(column=4, row =pointValue[1].row, value ="YES")
# else:
#if temp != pointValue[0].value:
#anotherSheet.cell(column=4, row =pointValue[1].row, value ="YES")
# anotherSheet.delete_rows(pointValue[0])
#anotherSheet.delete_rows(row[0].row,1)
I also attempted to include to find the column by name:
for col in script.iter_cols():
# see if the value of the first cell matches
if col[0].value == column_value:
# this is the column we want, this col is an iterable of cells:
for cell in col:
# do something with the cell in this column here
I'm not quite sure I understand what you want to do but the following might help. When you want to check for membership in Python use dictionaries and sets.
source = wb1["sheet"]
comparison = wb2["sheet"]
# create dictionaries of the cells in the worksheets keyed by cell value
source_cells = {row[0].value:row[0] for row in source.iter_rows(min_row=2, max_col=1)}
comparison_cells = {row[0].value:row[0] for row in comparison.iter_rows(min_row=2, max_col=1)}
shared = source_cells & comparison_cells # create a set of values in both sheets
missing = comparison_cells - source_cells # create a set of values only the other sheet
for value in shared:
cell = source_cells[value]
cell.offset(column=3).value = "YES"
to_remove = [comparison_cells[value].row for value in missing] # list of rows to be removed
for r in reversed(to_remove): # always remove rows from the bottom first
comparison.delete_rows(row=r)
You'll probably need to adjust this to suit your needs but I hope it helps.
A dictionary solved the issue:
I turned the tempList into a tempDict like so:
comp = dict.fromkeys(tempList)
So now it will return a dictionary.
I then instead of looping tempList I only looped the sheet.
Then in the if statement i checked if the value is in the directory.
for pointValue in skip_Head_of_anotherSheet:
if pointValue[21].value in comp:
#anotherSheet.cell(column=23, row=pointValue[21].row, value="YES")
anotherSheet.delete_rows(pointValue[21].row,1)
if pointValue[21].value not in comp:
#anotherSheet.cell(column=23, row=pointValue[21].row, value="NO")

How to insert a value based on two cells within a row (Dataframe)?

I have the following Dataframe:
Dataframe layout
I have the following written:
if (df_.loc[(df_['Camera'] == camera1) & (df_['Return'].isnull())]):
df_.loc[(df_['Camera'] == camera1) & (df_['Return'].isnull()), 'Return'] = time_
df_.to_csv(csv_file, index=False)
else:
df_ = df_.append(dfin, ignore_index = True)
df_.to_csv(csv_file, index=False)
...
camera1 being input from the user and time_ being todays date/time.
When user input is submitted, first condition would check if input is in the first column 'Camera', and second condition checks if the column with name 'Return' is empty, if true add the current date/time (time_), else create new row with the new info.
Just looking for on how to add the time_ value into the row were user input is already in the dataframe and if the 'Return' column is empty.
Was able to get what I wanted with the following:
def check():
set = df_.loc[(df_['Camera'] == camera1) & (df_['Return'].isnull())]
if set.size != 0:
return set.iat[0,4]
else:
pass
Then:
if check():
df_.loc[(df_['Camera'] == camera1) & (df_['Return'].isnull()), 'Return'] = time_
df_.to_csv(csv_file, index=False)
#....

Counting combinations in Dataframe create new Dataframe

So I have a dataframe called reactions_drugs
and I want to create a table called new_r_d where I keep track of how often a see a symptom for a given medication like
Here is the code I have but I am running into errors such as "Unable to coerce to Series, length must be 3 given 0"
new_r_d = pd.DataFrame(columns = ['drugname', 'reaction', 'count']
for i in range(len(reactions_drugs)):
name = reactions_drugs.drugname[i]
drug_rec_act = reactions_drugs.drug_rec_act[i]
for rec in drug_rec_act:
row = new_r_d.loc[(new_r_d['drugname'] == name) & (new_r_d['reaction'] == rec)]
if row == []:
# create new row
new_r_d.append({'drugname': name, 'reaction': rec, 'count': 1})
else:
new_r_d.at[row,'count'] += 1
Assuming the rows in your current reactions (drug_rec_act) column contain one string enclosed in a list, you can convert the values in that column to lists of strings (by splitting each string on the comma delimiter) and then utilize the explode() function and value_counts() to get your desired result:
df['drug_rec_act'] = df['drug_rec_act'].apply(lambda x: x[0].split(','))
df_long = df.explode('drug_rec_act')
result = df_long.groupby('drugname')['drug_rec_act'].value_counts().reset_index(name='count')

How to retrieve the column name and row name with a condition satisfied in a dataframe?

I need to check a condition if the sum of columns is 1 and if satisfies i want to retrieve the column names and row number in a dictionary.
The output should be list1=({8:1004},{9:1001}).
I have tried some python code but couldn't move forward with the code.
list1=[]
for Emp in SkillsA:
sum_row = (SkillsA.sum(axis=0))
#print(sum_row)
# print((Skills_A[0]))
if sum_row[Emp] == 1:
#print(Emp)
for ws in SkillsA:
# if SkillsA[ws][Emp] == 1:
print(SkillsA[ws][Emp])
#list1.update({Emp:ws})
With Pandas you can do it
import pandas as pd
# Import data
df = pd.read_excel("location_file")
# Create a dictionary
dict = dict()
# Iterate in columns
for i in df.columns:
if df[i].sum() == 1:
dict[i] = df.Employee_No[df[i] == 1] # Add filter data to dict
Based on describing the problem as
Find a mapping of columns labels to row labels for the 1s in columns having only a single 1,
it can also be done with a one-line function:
def indices_of_single_ones_by_column(df):
return [{col: df[col].idxmax()} for col in df.columns if df[col].sum() == 1]

Categories