I can't get pandas to union my dataframes properly - python
I try and concat or append (neither are working) 2 9-column dataframes together. But, instead of just doing a normal vertical stacking of them, pandas keeps trying to add 9 more empty columns as well. Do you know how to stop this?
output looks like this:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,0,1,10,11,12,13,2,3,4,5,6,7,8,9
10/23/2020,New Castle,DE,Gary,IN,Full,Flatbed,0.00,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency ,,,,,,,,,,,,,,
10/22/2020,Wilmington,DE,METHUEN,MA,Full,Flatbed / Step Deck,0.00,48,48,0,Ken,(903) 280-7878,UrTruckBroker ,,,,,,,,,,,,,,
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.00,47,1,0,Dispatch,(912) 748-3801,DSV Road Inc. ,,,,,,,,,,,,,,
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.00,48,1,0,Dispatch,(541) 826-4786,Sureway Transportation Co / Anderson Trucking Serv ,,,,,,,,,,,,,,
10/30/2020,New Castle,DE,Gary,IN,Full,Flatbed,945.00,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency ,,,,,,,,,,,,,,
...
,,,,,,,,,,,,,,03/02/2021,Knapp,0.0,Dispatch,(763) 432-3680,Fuze Logistics Services USA ,WI,Jackson,NE,Full,Flatbed / Step Deck,0.0,48.0,48.0
,,,,,,,,,,,,,,03/02/2021,Knapp,0.0,Dispatch,(763) 432-3680,Fuze Logistics Services USA ,WI,Sterling,IL,Full,Flatbed / Step Deck,0.0,48.0,48.0
,,,,,,,,,,,,,,03/02/2021,Milwaukee,0.0,Dispatch,(763) 432-3680,Fuze Logistics Services USA ,WI,Great Falls,MT,Full,Flatbed / Step Deck,0.0,45.0,48.0
,,,,,,,,,,,,,,03/02/2021,Algoma,0.0,Dispatch,(763) 432-3680,Fuze Logistics Services USA ,WI,Pamplico,SC,Full,Flatbed / Step Deck,0.0,48.0,48.0
code is a web request to get data, which I save to dataframe, which is then concat-ed with another dataframe that comes from a CSV. I then save all of this back to that csv:
this_csv = 'freights_trulos.csv'
try:
old_df = pd.read_csv(this_csv)
except BaseException as e:
print(e)
old_df = pd.DataFrame()
state, equip = 'DE', 'Flat'
url = "https://backend-a.trulos.com/load-table/grab_loads.php?state=%s&equipment=%s" % (state, equip)
payload = {}
headers = {
...
}
response = requests.request("GET", url, headers=headers, data=payload)
# print(response.text)
parsed = json.loads(response.content)
data = [r[0:13] + [r[-4].split('<br/>')[-2].split('>')[-1]] for r in parsed]
df = pd.DataFrame(data=data)
if not old_df.empty:
# concatenate old and new and remove duplicates
# df.reset_index(drop=True, inplace=True)
# old_df.reset_index(drop=True, inplace=True)
# df = pd.concat([old_df, df], ignore_index=True) <--- CONCAT HAS SAME ISSUES AS APPEND
df = df.append(old_df, ignore_index=True)
# remove duplicates on cols
df.drop_duplicates()
df.to_csv(this_csv, index=False)
EDIT appended df's have had their types changed
df.dtypes
Out[2]:
0 object
1 object
2 object
3 object
4 object
5 object
6 object
7 object
8 object
9 object
10 object
11 object
12 object
13 object
dtype: object
old_df.dtypes
Out[3]:
0 object
1 object
2 object
3 object
4 object
5 object
6 object
7 float64
8 int64
9 int64
10 int64
11 object
12 object
13 object
dtype: object
old_df to csv
0,1,2,3,4,5,6,7,8,9,10,11,12,13
10/23/2020,New Castle,DE,Gary,IN,Full,Flatbed,0.0,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency
10/22/2020,Wilmington,DE,METHUEN,MA,Full,Flatbed / Step Deck,0.0,48,48,0,Ken,(903) 280-7878,UrTruckBroker
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.0,47,1,0,Dispatch,(912) 748-3801,DSV Road Inc.
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.0,48,1,0,Dispatch,(541) 826-4786,Sureway Transportation Co / Anderson Trucking Serv
10/30/2020,New Castle,DE,Gary,IN,Full,Flatbed,945.0,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency
new_df to csv
0,1,2,3,4,5,6,7,8,9,10,11,12,13
10/23/2020,New Castle,DE,Gary,IN,Full,Flatbed,0.00,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency
10/22/2020,Wilmington,DE,METHUEN,MA,Full,Flatbed / Step Deck,0.00,48,48,0,Ken,(903) 280-7878,UrTruckBroker
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.00,47,1,0,Dispatch,(912) 748-3801,DSV Road Inc.
10/23/2020,WILMINGTON,DE,METHUEN,MA,Full,Flatbed w/Tarps,0.00,48,1,0,Dispatch,(541) 826-4786,Sureway Transportation Co / Anderson Trucking Serv
10/30/2020,New Castle,DE,Gary,IN,Full,Flatbed,945.00,46,48,0,Dispatch,(800) 488-1860,Meadow Lark Agency
I guess the problem could be how you read the data if I copy your sample data to excel and split by comma and then import to pandas, all is fine. Also if I split on comma AND whitespaces, I have +9 additional columns. So you could try debugging by replacing all whitespaces before creating your dataframe.
I also used your sample data and it workend just fine for me if I initialize it like this:
import pandas as pd
df_new = pd.DataFrame({'0': {0: '10/23/2020',
1: '10/22/2020',
2: '10/23/2020',
3: '10/23/2020',
4: '10/30/2020'},
'1': {0: 'New_Castle',
1: 'Wilmington',
2: 'WILMINGTON',
3: 'WILMINGTON',
4: 'New_Castle'},
'2': {0: 'DE', 1: 'DE', 2: 'DE', 3: 'DE', 4: 'DE'},
'3': {0: 'Gary', 1: 'METHUEN', 2: 'METHUEN', 3: 'METHUEN', 4: 'Gary'},
'4': {0: 'IN', 1: 'MA', 2: 'MA', 3: 'MA', 4: 'IN'},
'5': {0: 'Full', 1: 'Full', 2: 'Full', 3: 'Full', 4: 'Full'},
'6': {0: 'Flatbed',
1: 'Flatbed_/_Step_Deck',
2: 'Flatbed_w/Tarps',
3: 'Flatbed_w/Tarps',
4: 'Flatbed'},
'7': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 945.0},
'8': {0: 46, 1: 48, 2: 47, 3: 48, 4: 46},
'9': {0: 48, 1: 48, 2: 1, 3: 1, 4: 48},
'10': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'11': {0: 'Dispatch', 1: 'Ken', 2: 'Dispatch', 3: 'Dispatch', 4: 'Dispatch'},
'12': {0: '(800)_488-1860',
1: '(903)_280-7878',
2: '(912)_748-3801',
3: '(541)_826-4786',
4: '(800)_488-1860'},
'13': {0: 'Meadow_Lark_Agency_',
1: 'UrTruckBroker_',
2: 'DSV_Road_Inc._',
3: 'Sureway_Transportation_Co_/_Anderson_Trucking_Serv_',
4: 'Meadow_Lark_Agency_'}})
df_new = pd.DataFrame({'0': {0: '10/23/2020',
1: '10/22/2020',
2: '10/23/2020',
3: '10/23/2020',
4: '10/30/2020'},
'1': {0: 'New_Castle',
1: 'Wilmington',
2: 'WILMINGTON',
3: 'WILMINGTON',
4: 'New_Castle'},
'2': {0: 'DE', 1: 'DE', 2: 'DE', 3: 'DE', 4: 'DE'},
'3': {0: 'Gary', 1: 'METHUEN', 2: 'METHUEN', 3: 'METHUEN', 4: 'Gary'},
'4': {0: 'IN', 1: 'MA', 2: 'MA', 3: 'MA', 4: 'IN'},
'5': {0: 'Full', 1: 'Full', 2: 'Full', 3: 'Full', 4: 'Full'},
'6': {0: 'Flatbed',
1: 'Flatbed_/_Step_Deck',
2: 'Flatbed_w/Tarps',
3: 'Flatbed_w/Tarps',
4: 'Flatbed'},
'7': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 945.0},
'8': {0: 46, 1: 48, 2: 47, 3: 48, 4: 46},
'9': {0: 48, 1: 48, 2: 1, 3: 1, 4: 48},
'10': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'11': {0: 'Dispatch', 1: 'Ken', 2: 'Dispatch', 3: 'Dispatch', 4: 'Dispatch'},
'12': {0: '(800)_488-1860',
1: '(903)_280-7878',
2: '(912)_748-3801',
3: '(541)_826-4786',
4: '(800)_488-1860'},
'13': {0: 'Meadow_Lark_Agency_',
1: 'UrTruckBroker_',
2: 'DSV_Road_Inc._',
3: 'Sureway_Transportation_Co_/_Anderson_Trucking_Serv_',
4: 'Meadow_Lark_Agency_'}})
df_new.append(df_old, ignore_index=True)
#OR
pd.concat([df_new, df_old])
Related
Custom function to replace missing values in dataframe with median located in pivot table
I am attempting to write a function to replace missing values in the 'total_income' column with the median 'total_income' provided by the pivot table, using the row's 'education' and 'income_type' to index the pivot table. I want to populate using these medians so that the values are as optimal as they can be. Here is what I am testing: This is the first 5 rows of the dataframe as a dictionary: {'index': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}, 'children': {0: 1, 1: 1, 2: 0, 3: 3, 4: 0}, 'days_employed': {0: 8437.673027760233, 1: 4024.803753850451, 2: 5623.422610230956, 3: 4124.747206540018, 4: 340266.07204682194}, 'dob_years': {0: 42, 1: 36, 2: 33, 3: 32, 4: 53}, 'education': {0: "bachelor's degree", 1: 'secondary education', 2: 'secondary education', 3: 'secondary education', 4: 'secondary education'}, 'education_id': {0: 0, 1: 1, 2: 1, 3: 1, 4: 1}, 'family_status': {0: 'married', 1: 'married', 2: 'married', 3: 'married', 4: 'civil partnership'}, 'family_status_id': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1}, 'gender': {0: 'F', 1: 'F', 2: 'M', 3: 'M', 4: 'F'}, 'income_type': {0: 'employee', 1: 'employee', 2: 'employee', 3: 'employee', 4: 'retiree'}, 'debt': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'total_income': {0: 40620.102, 1: 17932.802, 2: 23341.752, 3: 42820.568, 4: 25378.572}, 'purpose': {0: 'purchase of the house', 1: 'car purchase', 2: 'purchase of the house', 3: 'supplementary education', 4: 'to have a wedding'}, 'age_group': {0: 'adult', 1: 'adult', 2: 'adult', 3: 'adult', 4: 'older adult'}} def fill_income(row): total_income = row['total_income'] age_group = row['age_group'] income_type = row['income_type'] education = row['education'] table = df.pivot_table(index=['age_group','income_type' ], columns='education', values='total_income', aggfunc='median') if total_income == 'NaN': if age_group =='adult': return table.loc[education, income_type] My desired output is the pivot table value (the median total_income) for the dataframe row's given education and income_type. When I test it, it returns 'None'. Thanks in advance for your time helping me with this problem!
Python conditional lookup
I have transactional table and a lookup table as below. I need add val field from df_lkp to df_txn by lookup. For each record of df_txn, I need to loop thru df_lkp. If the grp field value is a then compare only field a in both tables and get match. If the grp value is ab then compare fields a and b in both tables. If it is abc then a, b, c fields should be compared to fetch val field, and so on. Is there a way this could done in pandas without a for-loop? df_txn = pd.DataFrame({'id': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'}, 'amt': {0: 100, 1: 200, 2: 300, 3: 400, 4: 500, 5: 600, 6: 700}, 'a': {0: '226', 1: '227', 2: '248', 3: '236', 4: '248', 5: '236', 6: '236'}, 'b': {0: '0E31', 1: '0E32', 2: '0E40', 3: '0E35', 4: '0E40', 5: '0E40', 6: '0E33'}, 'c': {0: '3014', 1: '3015', 2: '3016', 3: '3016', 4: '3016', 5: '3016', 6: '3016'}}) df_lkp = pd.DataFrame({'a': {0: '226', 1: '227', 2: '236', 3: '237', 4: '248'}, 'b': {0: '0E31', 1: '0E32', 2: '0E33', 3: '0E35', 4: '0E40'}, 'c': {0: '3014', 1: '3015', 2: '3016', 3: '3018', 4: '3019'}, 'grp': {0: 'a', 1: 'ab', 2: 'abc', 3: 'b', 4: 'bc'}, 'val': {0: 'KE00CH0004', 1: 'KE00CH0003', 2: 'KE67593065', 3: 'KE67593262', 4: 'KE00CH0003'}}) the output df_tx2 = pd.DataFrame({'id': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'}, 'amt': {0: 100, 1: 200, 2: 300, 3: 400, 4: 500, 5: 600, 6: 700}, 'a': {0: '226', 1: '227', 2: '248', 3: '236', 4: '248', 5: '236', 6: '236'}, 'b': {0: '0E31', 1: '0E32', 2: '0E40', 3: '0E35', 4: '0E40', 5: '0E40', 6: '0E33'}, 'c': {0: '3014', 1: '3015', 2: '3016', 3: '3016', 4: '3016', 5: '3016', 6: '3016'}, 'val': {0: 'KE00CH0004', 1: 'KE00CH0003', 2: '', 3: '', 4: '', 5: '', 6: 'KE67593065'} })
In Python, pandas, how to ignore invalid values in python when i convert the columns from hexa to decimal?
when I use: df[["Type 2", "Type 4"]].applymap(lambda n: int(n, 16)) It stops in the error because of invalid value in Type 2 column because of invalid values (negative values, NaN, string...) for hexa conversion. how to ignore this error or mark the invalid value as zero {'Type 1': {0: 1, 1: 3, 2: 5, 3: 7, 4: 9, 5: 11, 6: 13, 7: 15, 8: 17}, 'Type 2': {0: 'AA', 1: 'BB', 2: 'NaN', 3: '55', 4: '3.14', 5: '-96', 6: 'String', 7: 'FFFFFF', 8: 'FEEE'}, 'Type 3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0}, 'Type 4': {0: '23', 1: 'fefe', 2: 'abcd', 3: 'dddd', 4: 'dad', 5: 'cfe', 6: 'cf42', 7: '321', 8: '0'}, 'Type 5': {0: -120, 1: -120, 2: -120, 3: -120, 4: -120, 5: -120, 6: -120, 7: -120, 8: -120}}
You can create a personalized function that handles this exception to use in your lambda. For example: def lambda_int(n): try: return int(n, 16) except ValueError: return 0 df[["Type 2", "Type 4"]] = df[["Type 2", "Type 4"]].applymap(lambda n: lambda_int(n))
Please go through this, i reconstructed your question and gave steps to follow 1. You first dictionary you provided does not have a value, it has a string "NaN" data = {'Type 1': {0: 1, 1: 3, 2: 5, 3: 7, 4: 9, 5: 11, 6: 13, 7: 15, 8: 17}, 'Type 2': {0: 'AA', 1: 'BB', 2: 'NaN', 3: '55', 4: '3.14', 5: '-96', 6: 'String', 7: 'FFFFFF', 8: 'FEEE'}, 'Type 3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0}, 'Type 4': {0: '23', 1: 'fefe', 2: 'abcd', 3: 'dddd', 4: 'dad', 5: 'cfe', 6: 'cf42', 7: '321', 8: '0'}, 'Type 5': {0: -120, 1: -120, 2: -120, 3: -120, 4: -120, 5: -120, 6: -120, 7: -120, 8: -120}} import pandas as pd df = pd.DataFrame(data) df.head() To check nan in your df and remove them columns_with_na = df.isna().sum() #filter starting from 1 missing value columns_with_na = columns_with_na[columns_with_na != 0] print(len(columns_with_na)) print(len(columns_with_na.sort_values(ascending = False))) #print them in descendng order Prints 0 and 0 because there is no nan Reconstructed your data to include a nan by using numpy.nan import numpy as np #recreated a dataset and included a nan value : np.nan at Type 2 data = {'Type 1': {0: 1, 1: 3, 2: 5, 3: 7, 4: 9, 5: 11, 6: 13, 7: 15, 8: 17}, 'Type 2': {0: 'AA', 1: 'BB', 2: np.nan, 3: '55', 4: '3.14', 5: '-96', 6: 'String', 7: 'FFFFFF', 8: 'FEEE'}, 'Type 3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0}, 'Type 4': {0: '23', 1: 'fefe', 2: 'abcd', 3: 'dddd', 4: 'dad', 5: 'cfe', 6: 'cf42', 7: '321', 8: '0'}, 'Type 5': {0: -120, 1: -120, 2: -120, 3: -120, 4: -120, 5: -120, 6: -120, 7: -120, 8: -120}} df2 = pd.DataFrame(data) df2.head() #sum up number of columns with nan columns_with_na = df2.isna().sum() #filter starting from 1 missing value columns_with_na = columns_with_na[columns_with_na != 0] print(len(columns_with_na)) print(len(columns_with_na.sort_values(ascending = False))) prints 1 and 1 because there is a nan at Type 2 column #drop nan values df2 = df2.dropna(how = 'any') #sum up number of columns with nan columns_with_na = df2.isna().sum() #filter starting from 1 missing value columns_with_na = columns_with_na[columns_with_na != 0] print(len(columns_with_na)) #prints 0 because I dropped all the nan values df2.head() To fill nan in df with 0 use: df2.fillna(0, inplace = True) Fill in nan with 0 in df2['Type 2'] only: #if you dont want to change the origina dataframe set inplace to false df2['Type 2'].fillna(0, inplace = True) #inplace is set to True to change the original df
How to convert if/else to np.where in pandas
My code is below apply pd.to_numeric to the columns where supposed to int or float but coming as object. Can we convert more into pandas way like applying np.where if df.dtypes.all() == 'object': df=df.apply(pd.to_numeric,errors='coerce').fillna(df) else: df = df
A simple one liner is assign with selest_dtypes which will reassign existing columns df.assign(**df.select_dtypes('O').apply(pd.to_numeric,errors='coerce').fillna(df)) np.where: df[:] = (np.where(df.dtypes=='object', df.apply(pd.to_numeric,errors='coerce').fillna(df),df) Example (check Price column) : d = {'CusID': {0: 1, 1: 2, 2: 3}, 'Name': {0: 'Paul', 1: 'Mark', 2: 'Bill'}, 'Shop': {0: 'Pascal', 1: 'Casio', 2: 'Nike'}, 'Price': {0: '24000', 1: 'a', 2: '900'}} df = pd.DataFrame(d) print(df) CusID Name Shop Price 0 1 Paul Pascal 24000 1 2 Mark Casio a 2 3 Bill Nike 900 df.to_dict() {'CusID': {0: 1, 1: 2, 2: 3}, 'Name': {0: 'Paul', 1: 'Mark', 2: 'Bill'}, 'Shop': {0: 'Pascal', 1: 'Casio', 2: 'Nike'}, 'Price': {0: '24000', 1: 'a', 2: '900'}} (df.assign(**df.select_dtypes('O').apply(pd.to_numeric,errors='coerce') .fillna(df)).to_dict()) {'CusID': {0: 1, 1: 2, 2: 3}, 'Name': {0: 'Paul', 1: 'Mark', 2: 'Bill'}, 'Shop': {0: 'Pascal', 1: 'Casio', 2: 'Nike'}, 'Price': {0: 24000.0, 1: 'a', 2: 900.0}}
Equivalent of your if/else is df.mask df_out = df.mask(df.dtypes =='O', df.apply(pd.to_numeric, errors='coerce') .fillna(df))
Apply function across pandas dataframe columns
This seems to have been similarly answered, but I can't get it to work. I have a pandas DataFrame that looks like sig_vars below. This df has a VAF and a Background column. I would like to use the ztest function from statsmodels to assign a p-value to a new p-value column. The p-value is calculated something like this for each row: from statsmodels.stats.weightstats import ztest p_value = ztest(sig_vars.Background,value=sig_vars.VAF)[1] I have tried something like this, but I can't quite get it to work: def calc(x): return ztest(x.Background, value=x.VAF.astype(float))[1] sig_vars.dropna().assign(pval = lambda x: calc(x)).head() It seems strange to me that this works just fine however: def calc(x): return ztest([0.0001,0.0002,0.0001], value=x.VAF.astype(float))[1] sig_vars.dropna().assign(pval = lambda x: calc(x)).head() Here is my DataFrame sig_vars: sig_vars = pd.DataFrame({'AO': {0: 4.0, 1: 16.0, 2: 12.0, 3: 19.0, 4: 2.0}, 'Background': {0: nan, 1: [0.00018832391713747646, 0.0002114408734430263, 0.000247843759294141], 2: nan, 3: [0.00023965141612200435, 0.00018864365214110544, 0.00036566589684372596, 0.0005452562704471102], 4: [0.00017349063150589867]}, 'Change': {0: 'T>A', 1: 'T>C', 2: 'T>A', 3: 'T>C', 4: 'C>A'}, 'Chrom': {0: 'chr1', 1: 'chr1', 2: 'chr1', 3: 'chr1', 4: 'chr1'}, 'ConvChange': {0: 'T>A', 1: 'T>C', 2: 'T>A', 3: 'T>C', 4: 'C>A'}, 'DP': {0: 16945.0, 1: 16945.0, 2: 16969.0, 3: 16969.0, 4: 16969.0}, 'Downstream': {0: 'NaN', 1: 'NaN', 2: 'NaN', 3: 'NaN', 4: 'NaN'}, 'Gene': {0: 'TIIIa', 1: 'TIIIa', 2: 'TIIIa', 3: 'TIIIa', 4: 'TIIIa'}, 'ID': {0: '86.fastq/onlyProbedRegions.vcf', 1: '86.fastq/onlyProbedRegions.vcf', 2: '86.fastq/onlyProbedRegions.vcf', 3: '86.fastq/onlyProbedRegions.vcf', 4: '86.fastq/onlyProbedRegions.vcf'}, 'Individual': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}, 'IntEx': {0: 'TIII', 1: 'TIII', 2: 'TIII', 3: 'TIII', 4: 'TIII'}, 'Loc': {0: 115227854, 1: 115227854, 2: 115227855, 3: 115227855, 4: 115227856}, 'Upstream': {0: 'NaN', 1: 'NaN', 2: 'NaN', 3: 'NaN', 4: 'NaN'}, 'VAF': {0: 0.00023605783416937148, 1: 0.0009442313366774859, 2: 0.0007071719017031057, 3: 0.0011196888443632507, 4: 0.00011786198361718427}, 'Var': {0: 'A', 1: 'C', 2: 'A', 3: 'C', 4: 'A'}, 'WT': {0: 'T', 1: 'T', 2: 'T', 3: 'T', 4: 'C'}})
Try this: def calc(x): return ztest(x['Background'], value=float(x['VAF']))[1] sig_vars['pval'] = sig_vars.dropna().apply(calc, axis=1)