Unexpected KeyError with for loop but not when manual

Unexpected KeyError with for loop but not when manual - python

I have written a function that manually creates separate dataframes for each participant in the main dataframe. However, I'm trying to write it so that it's more automated as participants will be added to the dataframe in the future.
My original function:
def separate_participants(main_df):
S001 = main_df[main_df['participant'] == 'S001']
S001.name = "S001"
S002 = main_df[main_df['participant'] == 'S002']
S002.name = "S002"
S003 = main_df[main_df['participant'] == 'S003']
S003.name = "S003"
S004 = main_df[main_df['participant'] == 'S004']
S004.name = "S004"
S005 = main_df[main_df['participant'] == 'S005']
S005.name = "S005"
S006 = main_df[main_df['participant'] == 'S006']
S006.name = "S006"
S007 = main_df[main_df['participant'] == 'S007']
S007.name = "S007"
participants = (S001, S002, S003, S004, S005, S006, S007)
participant_names = (S001.name, S002.name, S003.name, S004.name, S005.name, S006.name, S007.name)
return participants, participant_names
However, when I try and change this I get a KeyError for the name of the participant in the main_df. The code is as follows:
def separate_participants(main_df):
participant_list = list(main_df.participant.unique())
participants = []
for participant in participant_list:
name = participant
temp_df = main_df[main_df[participant] == participant]
name = temp_df
participants.append(name)
return participants
The error I get: KeyError: 'S001'
I can't seem to figure out what I'm doing wrong, that means it works in the old function but not the new one. The length of the objects in the dataframe and the list are the same (4) so there are no extra characters.
Any help/pointers would be greatly appreciated!

Thanks #Iguananaut for the answer:
Your DataFrame has a column named 'participant' but you're indexing it with the value of the variable participant which is presumably not a column in your DataFrame. You probably wanted main_df['participant']. Most likely the KeyError came with a "traceback" leading back to the line temp_df = main_df[main_df[participant] == participant] which suggests you should examine it closely.

Related

Xlookup with panda in Python

I am very new to Python and i would like to use xlookup in order to look for values in different columns (column"Debt", column "Liquidity" etc..) in a database
And fill the value in the cells (C17, C18, c19....) of a number of destination files which have the same format
path_source= r"C:\Test source.xlsx"
destination_file= r"C:Stress Test Q4 2022\test.xlsx"
df1 = pd.read_excel(path_source)
df2= pd.read_excel(destination_file)
def xlookup(lookup_value, lookup_array, return_array, if_not_found:str = ''):
match_value = return_array.loc[lookup_array == lookup_value]
if match_value.empty:
return f'"{lookup_value}" not found!' if if_not_found == '' else if_not_found
else:
return match_value.tolist()[0]
df2.iloc[2,17]= df1["debt"].apply(xlookup, args = (main_df1["Fund name"],main_df1["fund_A"] ))
NameError: name 'main_df1' is not defined
can anyone help correct the code ?
thanks a lot

Iterating through the python and pandas loop

What is the best way to look into how the loop works thourh iterations?
I have defined 2 functions which have to go in one after another (the 2nd gets the result of the 1st and works it through).
Ultimately I need 2-line pandas dataframe as the output.
Sample code below.
def candle_data (
figi,
int1 = candle_resolution,
from1 = previous_minutemark,
to1 = last_minutemark
):
response = market_api.market_candles_get(figi = ticker_figi_test, from_ = from1, to = to1, interval = int1)
if response.status_code == 200:
return response.parse_json().dict()
else:
return print(response.parse_error())
def response_to_pandas_df (response):
df_candles = pd.DataFrame(response['payload'])
df_candles = pd.json_normalize(df_candles['candles'])
df_candles = df_candles[df_candles['time'] >= previous_minutemark]
df_candles = df_candles[['c', 'figi','time']]
df_candles_main = df_candles_template.append(df_candles)
return df_candles_main
then I call the functions in a loop:
ticker_figi_list = ["BBG000CL9VN6", "BBG000R7Z112"]
df_candles_main = df_candles_template
for figi in ticker_figi_list:
response = candle_data(figi)
df_candles_main = response_to_pandas_df(response)
But in return I get only 1 row of data for the 1st FIGI in the list.
I suppose, that I define the candle_data() function with figi_ticker_test which contain only 1 value may be the case. But I'm not sure how to work this around.
Thank you in advance.

It looks like the problem is you are calling the api with figi = ticker_figi_test. I assume ticket_figi_test is equal to the first figi in your list and so you aren't actually calling the api with different figi on each iteration. Try changing to the following:
response = market_api.market_candles_get(figi = figi, from_ = from1, to = to1, interval = int1)

Select row data if other row data is matched

I have dataframe:
import pandas as pd
df = pd.read_csv('data.csv')
df.head()
title poster
0 Toy Story https://images-na.ssl-images-amazon.com/images...
1 Jumanji https://images-na.ssl-images-amazon.com/images...
I want to create a function which will take movie title as the input and return an return the poster link as output. I tried the following, but it is not working:
def function_to_return_link(movie_name):
if df['title'].str.contains(movie_name).any():
print('Movie present in df')
out = df.loc[df['title'] == movie_name]
print(out)
else:
print('Movie is not present')
It showing the output as:
function_to_return_link('Toy Story')
Movie present in df
Empty DataFrame
Columns: [title, poster]
Index: []

df.loc[..., 'poster'] returns a pd.Series with your selected movie(s). Then, use pd.Series.iat to get the first value in the selection (by index). If the movie isn't present, then it raises an IndexError.
def function_to_return_link(movie_name):
posters = df.loc[df['title'].str.contains(movie_name), 'poster']
try:
link = posters.iat[0]
except IndexError:
print('Movie is not present')
else:
return link
Note that this doesn't account for duplicate entries (multiple entries). To deal with that, you could do the below (though it's arguably/perhaps less pythonic than try/except).
def function_to_return_link(movie_name):
posters = df.loc[df['title'].str.contains(movie_name), 'poster']
if len(posters) > 1:
print('Multiple hits')
elif len(posters) == 0:
print('Movie is not present')
else:
return posters.iat[0]

Here's a way you could do:
def function_to_return_link(movie_name):
if movie_name in pos['title']:
return pos.query("title == #movie_name")['poster']
else:
print('Movie is not present')

adding multiple columns to a dataframe using df.apply and a lambda function

I am trying to add multiple columns to an existing dataframe with df.apply and a lambda function. I am able to add columns one by one but not able to do it for all the columns together.
My code
def get_player_stats(player_name):
print(player_name)
resp = requests.get(player_id_api + player_name)
if resp.status_code != 200:
# This means something went wrong.
print('Error {}'.format(resp.status_code))
result = resp.json()
player_id = result['data'][0]['pid']
resp_data = requests.get(player_data_api + str(player_id))
if resp_data.status_code != 200:
# This means something went wrong.
print('Error {}'.format(resp_data.status_code))
result_data = resp_data.json()
check1 = len(result_data.get('data',None).get('batting',None))
# print(check1)
check2 = len(result_data.get('data',{}).get('batting',{}).get('ODIs',{}))
# check2 = result_data.get(['data']['batting']['ODIs'],None)
# print(check2)
if check1 > 0 and check2 > 0:
total_6s = result_data['data']['batting']['ODIs']['6s']
total_4s = result_data['data']['batting']['ODIs']['4s']
average = result_data['data']['batting']['ODIs']['Ave']
total_innings = result_data['data']['batting']['ODIs']['Inns']
total_catches = result_data['data']['batting']['ODIs']['Ct']
total_stumps = result_data['data']['batting']['ODIs']['St']
total_wickets = result_data['data']['bowling']['ODIs']['Wkts']
print(average,total_innings,total_4s,total_6s,total_catches,total_stumps,total_wickets)
return np.array([average,total_innings,total_4s,total_6s,total_catches,total_stumps,total_wickets])
else:
print('No data for player')
return '','','','','','',''
cols = ['Avg','tot_inns','tot_4s','tot_6s','tot_cts','tot_sts','tot_wkts']
for col in cols:
players_available[col] = ''
players_available[cols] = players_available.apply(lambda x: get_player_stats(x['playerName']) , axis =1)
I have tried adding columns explicitly to the dataframe but still i am getting an error
ValueError: Must have equal len keys and value when setting with an iterable
Can someone help me with this?

It's tricky, since in pandas the apply method evolve through versions.
In my version (0.25.3) and also the other recent versions, if the function returns pd.Series object then it works.
In your code, you could try to change the return value in the function:
return pd.Series([average,total_innings,total_4s,total_6s,
total_catches,total_stumps,total_wickets])
return pd.Series(['','','','','','',''])

List index out of range error for loop on Json

I'm trying to request a json object and run through the object with a for loop and take out the data I need and save it to a model in django.
I only want the first two attributes of runner_1_name and runner_2_name but in my json object the amount or runners varies inside each list. I keep getting list index out of range error. I have tried to use try and accept but when I try save to the model it's showing my save variables is referenced before assignment What's the best way of ignoring list index out or range error or fixing the list so the indexes are correct? I also want the code to run really fast as I will using this function as a background task to poll every two seconds.
#shared_task()
def mb_get_events():
mb = APIClient('username' , 'pass')
tennis_events = mb.market_data.get_events()
for data in tennis_events:
id = data['id']
event_name = data['name']
sport_id = data['sport-id']
start_time = data['start']
is_ip = data['in-running-flag']
par = data['event-participants']
event_id = par[0]['event-id']
cat_id = data['meta-tags'][0]['id']
cat_name = data['meta-tags'][0]['name']
cat_type = data['meta-tags'][0]['type']
url_name = data['meta-tags'][0]['type']
try:
runner_1_name = data['markets'][0]['runners'][0]['name']
except IndexError:
pass
try:
runner_2_name = data['markets'][0]['runners'][1]['name']
except IndexError:
pass
run1_par_id = data['markets'][0]['runners'][0]['id']
run2_par_id = data['markets'][0]['runners'][1]['id']
run1_back_odds = data['markets'][0]['runners'][0]['prices'][0]['odds']
run2_back_odds = data['markets'][0]['runners'][1]['prices'][0]['odds']
run1_lay_odds = data['markets'][0]['runners'][0]['prices'][3]['odds']
run2_lay_odds = data['markets'][0]['runners'][1]['prices'][3]['odds']
te, created = MBEvent.objects.update_or_create(id=id)
te.id = id
te.event_name = event_name
te.sport_id = sport_id
te.start_time = start_time
te.is_ip = is_ip
te.event_id = event_id
te.runner_1_name = runner_1_name
te.runner_2_name = runner_2_name
te.run1_back_odds = run1_back_odds
te.run2_back_odds = run2_back_odds
te.run1_lay_odds = run1_lay_odds
te.run2_lay_odds = run2_lay_odds
te.run1_par_id = run1_par_id
te.run2_par_id = run2_par_id
te.cat_id = cat_id
te.cat_name = cat_name
te.cat_type = cat_type
te.url_name = url_name
te.save()

Quick Fix:
try:
runner_1_name = data['markets'][0]['runners'][0]['name']
except IndexError:
runner_1_name = '' # don't just pass here
try:
runner_2_name = data['markets'][0]['runners'][1]['name']
except IndexError:
runner_2_name = ''
It giving you variables is referenced before assignment because in expect block you are just passing, so if try fails runner_1_name or runner_2_name is never defined. You when you try to use those variables you get an error because they were never defined. So in except block either set the value to a blank string or some other string like 'Runner Does not Exists'.
Now if you want to totally avoid try/except and IndexError you can use if statements to check the length of markets and runners. Something like this:
runner_1_name = ''
runner_2_name = ''
# Make sure markets exists in data and its length is greater than 0 and runners exists in first market
if 'markets' in data and len(data['markets']) > 0 and 'runners' in data['market'][0]:
runners = data['markets'][0]['runners']
# get runner 1
if len(runners) > 0 and `name` in runners[0]:
runner_1_name = runners[0]['name']
else:
runner_1_name = 'Runner 1 does not exists'
# get runner 2
if len(runners) > 1 and `name` in runners[1]:
runner_2_name = runners[1]['name']
else:
runner_2_name = 'Runner 2 does not exists'
As you can see this gets too long and its not the recommended way to do things.
You should just assume data is alright and try to get the names and use try/except to catch any errors as suggested above in my first code snippet.

I had the issue with a list of comments that can be empty or filled by an unknown number of comments
My solution is to initialize a counting variable at 0 and have a while loop on a boolean
In the loop I try to get comment[count] if it fails on except IndexError I set the boolean to False to stop the infinite loop
count = 0
condition_continue = True
while condition_continue :
try:
detailsCommentDict = comments[count]
....
except IndexError:
# no comment at all or no more comment
condition_continue = False

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unexpected KeyError with for loop but not when manual - python

Related

Xlookup with panda in Python

Iterating through the python and pandas loop

Select row data if other row data is matched

adding multiple columns to a dataframe using df.apply and a lambda function

List index out of range error for loop on Json

Categories

Resources