Xlookup with panda in Python - python

I am very new to Python and i would like to use xlookup in order to look for values in different columns (column"Debt", column "Liquidity" etc..) in a database
And fill the value in the cells (C17, C18, c19....) of a number of destination files which have the same format
path_source= r"C:\Test source.xlsx"
destination_file= r"C:Stress Test Q4 2022\test.xlsx"
df1 = pd.read_excel(path_source)
df2= pd.read_excel(destination_file)
def xlookup(lookup_value, lookup_array, return_array, if_not_found:str = ''):
match_value = return_array.loc[lookup_array == lookup_value]
if match_value.empty:
return f'"{lookup_value}" not found!' if if_not_found == '' else if_not_found
else:
return match_value.tolist()[0]
df2.iloc[2,17]= df1["debt"].apply(xlookup, args = (main_df1["Fund name"],main_df1["fund_A"] ))
NameError: name 'main_df1' is not defined
can anyone help correct the code ?
thanks a lot

Related

Add a timestamp column to output table in Python Dash Example

I am trying to add a timestamp column to the table shown in this Python Dash Example:
https://github.com/plotly/dash-sample-apps/blob/main/apps/dash-image-annotation/app.py
The aim is to have a timestamp for each of the created objects.
So far, I managed to:
Create a new column in the output table (Line 48 of the Github code)
Trying to append a timestamp by adding this to the "modify_table_entries" function (line 466 in the Github code):
annotations_table_data[0]["timestamp"] = time_passed(annotations_store_data["starttime"])
This gives me a timestamp only for the first entry in the output table:
I tried now however for several days now. I believe I have to somehow append the timestamp to each created object in this function of the code:
def modify_table_entries(
previous_n_clicks,
next_n_clicks,
graph_relayoutData,
annotations_table_data,
image_files_data,
annotations_store_data,
annotation_type,
):
cbcontext = [p["prop_id"] for p in dash.callback_context.triggered][0]
if cbcontext == "graph.relayoutData":
#debug_print("graph_relayoutData:", graph_relayoutData)
#debug_print("annotations_table_data before:", annotations_table_data)
if "shapes" in graph_relayoutData.keys():
# this means all the shapes have been passed to this function via
# graph_relayoutData, so we store them
annotations_table_data = [
shape_to_table_row(sh) for sh in graph_relayoutData["shapes"]
]
elif re.match("shapes\[[0-9]+\].x0", list(graph_relayoutData.keys())[0]):
# this means a shape was updated (e.g., by clicking and dragging its
# vertices), so we just update the specific shape
annotations_table_data = annotations_table_shape_resize(
annotations_table_data, graph_relayoutData
)
if annotations_table_data is None:
return dash.no_update
else:
debug_print("annotations_table_data after:", annotations_table_data)
annotations_table_data[0]["timestamp"] = time_passed(annotations_store_data["starttime"])
return (annotations_table_data, image_files_data)
image_index_change = 0
if cbcontext == "previous.n_clicks":
image_index_change = -1
if cbcontext == "next.n_clicks":
image_index_change = 1
image_files_data["current"] += image_index_change
image_files_data["current"] %= len(image_files_data["files"])
if image_index_change != 0:
# image changed, update annotations_table_data with new data
annotations_table_data = []
filename = image_files_data["files"][image_files_data["current"]]
#debug_print(annotations_store_data[filename])
for sh in annotations_store_data[filename]["shapes"]:
annotations_table_data.append(shape_to_table_row(sh))
return (annotations_table_data, image_files_data)
else:
return dash.no_update
Any help is much appreciated!

Pandas dataframe column needs to pass as an input to another function

I am newbie in python so please help me out .
I have a function in my client api -:
def captureserial() -> None :
ser_no = xxxx
inc_exp = "Yes"
inc_exp_flag = inc_exp.lower()[0] == "y"
args = {
"serial_number": ser_no,
"include_expired": inc_exp_flag,
}
ret, stat = apic_send_rq("certs", args)
print(ser_no)
qualify = "All" if inc_exp_flag else "Non-expired"
output("%s case with serial number '%s'" % (qualify, ser_no), stat, ret)
This is my dataframe
sg.theme('Light Blue 2')
layout = [[sg.Text('Enter files to upload')],
[sg.Text('File 1', size=(8, 1)), sg.Input(), sg.FileBrowse()],
[sg.Submit(), sg.Cancel()]]
window = sg.Window('File', layout)
event, values = window.read()
window.close()
print(f'You clicked {event}')
print(f'You chose filenames {values[0]}')
if __name__ == '__main__':
df = pd.read_excel(values[0],usecols = [0])
print(df)
df.columns=['PluginOutput']
df['Serial Number'] =df['PluginOutput'].apply(lambda x: find_serialnumber(x))
df['Serial Number'] = df['Serial Number'].str.replace(' ','')
print(df)
I want to pass one of the column (Serial Number) of my data frame into the function "captureserial()" so it a take the serial Number and process the data accordingly and give me a output, which I have to write in an excel sheet.
How can I pass a single data frame column to another function and capture the output returned by that function in excel sheet.
Thanks
When you use function apply of a Series (because df['PluginOutput'] is a Series object, check it out) you pass each value to the function used inside the apply. So the function should take a PluginOutput and return desired value (not Series and not DataFrame - the value itself). To save just use to_excel fun e.g. df.to_excel("output.xlsx", sheet_name='Sheet_name_1')

Select row data if other row data is matched

I have dataframe:
import pandas as pd
df = pd.read_csv('data.csv')
df.head()
title poster
0 Toy Story https://images-na.ssl-images-amazon.com/images...
1 Jumanji https://images-na.ssl-images-amazon.com/images...
I want to create a function which will take movie title as the input and return an return the poster link as output. I tried the following, but it is not working:
def function_to_return_link(movie_name):
if df['title'].str.contains(movie_name).any():
print('Movie present in df')
out = df.loc[df['title'] == movie_name]
print(out)
else:
print('Movie is not present')
It showing the output as:
function_to_return_link('Toy Story')
Movie present in df
Empty DataFrame
Columns: [title, poster]
Index: []
df.loc[..., 'poster'] returns a pd.Series with your selected movie(s). Then, use pd.Series.iat to get the first value in the selection (by index). If the movie isn't present, then it raises an IndexError.
def function_to_return_link(movie_name):
posters = df.loc[df['title'].str.contains(movie_name), 'poster']
try:
link = posters.iat[0]
except IndexError:
print('Movie is not present')
else:
return link
Note that this doesn't account for duplicate entries (multiple entries). To deal with that, you could do the below (though it's arguably/perhaps less pythonic than try/except).
def function_to_return_link(movie_name):
posters = df.loc[df['title'].str.contains(movie_name), 'poster']
if len(posters) > 1:
print('Multiple hits')
elif len(posters) == 0:
print('Movie is not present')
else:
return posters.iat[0]
Here's a way you could do:
def function_to_return_link(movie_name):
if movie_name in pos['title']:
return pos.query("title == #movie_name")['poster']
else:
print('Movie is not present')

adding multiple columns to a dataframe using df.apply and a lambda function

I am trying to add multiple columns to an existing dataframe with df.apply and a lambda function. I am able to add columns one by one but not able to do it for all the columns together.
My code
def get_player_stats(player_name):
print(player_name)
resp = requests.get(player_id_api + player_name)
if resp.status_code != 200:
# This means something went wrong.
print('Error {}'.format(resp.status_code))
result = resp.json()
player_id = result['data'][0]['pid']
resp_data = requests.get(player_data_api + str(player_id))
if resp_data.status_code != 200:
# This means something went wrong.
print('Error {}'.format(resp_data.status_code))
result_data = resp_data.json()
check1 = len(result_data.get('data',None).get('batting',None))
# print(check1)
check2 = len(result_data.get('data',{}).get('batting',{}).get('ODIs',{}))
# check2 = result_data.get(['data']['batting']['ODIs'],None)
# print(check2)
if check1 > 0 and check2 > 0:
total_6s = result_data['data']['batting']['ODIs']['6s']
total_4s = result_data['data']['batting']['ODIs']['4s']
average = result_data['data']['batting']['ODIs']['Ave']
total_innings = result_data['data']['batting']['ODIs']['Inns']
total_catches = result_data['data']['batting']['ODIs']['Ct']
total_stumps = result_data['data']['batting']['ODIs']['St']
total_wickets = result_data['data']['bowling']['ODIs']['Wkts']
print(average,total_innings,total_4s,total_6s,total_catches,total_stumps,total_wickets)
return np.array([average,total_innings,total_4s,total_6s,total_catches,total_stumps,total_wickets])
else:
print('No data for player')
return '','','','','','',''
cols = ['Avg','tot_inns','tot_4s','tot_6s','tot_cts','tot_sts','tot_wkts']
for col in cols:
players_available[col] = ''
players_available[cols] = players_available.apply(lambda x: get_player_stats(x['playerName']) , axis =1)
I have tried adding columns explicitly to the dataframe but still i am getting an error
ValueError: Must have equal len keys and value when setting with an iterable
Can someone help me with this?
It's tricky, since in pandas the apply method evolve through versions.
In my version (0.25.3) and also the other recent versions, if the function returns pd.Series object then it works.
In your code, you could try to change the return value in the function:
return pd.Series([average,total_innings,total_4s,total_6s,
total_catches,total_stumps,total_wickets])
return pd.Series(['','','','','','',''])

Unexpected KeyError with for loop but not when manual

I have written a function that manually creates separate dataframes for each participant in the main dataframe. However, I'm trying to write it so that it's more automated as participants will be added to the dataframe in the future.
My original function:
def separate_participants(main_df):
S001 = main_df[main_df['participant'] == 'S001']
S001.name = "S001"
S002 = main_df[main_df['participant'] == 'S002']
S002.name = "S002"
S003 = main_df[main_df['participant'] == 'S003']
S003.name = "S003"
S004 = main_df[main_df['participant'] == 'S004']
S004.name = "S004"
S005 = main_df[main_df['participant'] == 'S005']
S005.name = "S005"
S006 = main_df[main_df['participant'] == 'S006']
S006.name = "S006"
S007 = main_df[main_df['participant'] == 'S007']
S007.name = "S007"
participants = (S001, S002, S003, S004, S005, S006, S007)
participant_names = (S001.name, S002.name, S003.name, S004.name, S005.name, S006.name, S007.name)
return participants, participant_names
However, when I try and change this I get a KeyError for the name of the participant in the main_df. The code is as follows:
def separate_participants(main_df):
participant_list = list(main_df.participant.unique())
participants = []
for participant in participant_list:
name = participant
temp_df = main_df[main_df[participant] == participant]
name = temp_df
participants.append(name)
return participants
The error I get: KeyError: 'S001'
I can't seem to figure out what I'm doing wrong, that means it works in the old function but not the new one. The length of the objects in the dataframe and the list are the same (4) so there are no extra characters.
Any help/pointers would be greatly appreciated!
Thanks #Iguananaut for the answer:
Your DataFrame has a column named 'participant' but you're indexing it with the value of the variable participant which is presumably not a column in your DataFrame. You probably wanted main_df['participant']. Most likely the KeyError came with a "traceback" leading back to the line temp_df = main_df[main_df[participant] == participant] which suggests you should examine it closely.

Categories