I'm trying to join two dataframes in Pandas.
The first frame is called Trades and has these columns:
TRADE DATE
ACCOUNT
COMPANY
COST CENTER
CURRENCY
The second frame is called Company_Mapping and has these columns:
ACTUAL_COMPANY_ID
MAPPED_COMPANY_ID
I'm trying to join them with this code:
trade_df = pd.merge(left=Trades, right = Company_Mapping, how = 'left', left_on = 'COMPANY', right_on = 'ACTUAL_COMPANY_ID'
This returns:
KeyError: 'COMPANY'
I've double checked the spelling and COMPANY is clearly in Trades, and I have no clue what would cause this.
Any ideas?
Thanks!
Your Trades dataframe has a single column with all the intended column names mashed together into a single string. Check the code that parses your file.
Make sure you read your file with the right seperation.
df = pd.read_csv("file.csv", sep=';')
or
df = pd.read_csv("file.csv", sep=',')
Essentially keyError is shown in pandas python when there is no such column name for example you are typing df.loc[df['one']==10] but column name 'one does not exist' whoever if it exist and you are still getting the same error try place try and except statement my problem was solved using try and except statement.
for example
try:
df_new = df.loc[df['one']==10]
except KeyError:
print('No KeyError')
Just in case someone have the same problem, sometimes you need to transpose your dataframe:
import pandas as pd
df = pd.read_csv('file.csv')
# A B C
# -------
# 1 2 3
# 4 5 6
new_df = pd.DataFrame([df['A'], df['B']])
# A | 1 4
# B | 2 5
new_df['A'] # KeyError
new_df = new_df.T
# A B
# ---
# 1 2
# 4 5
new_df['A'] # KeyError
# A
# -
# 1
# 4
Related
I am trying to read a column in python, and create a new column using python.
import pandas as pd
df = pd.read_csv (r'C:\Users\User\Documents\Research\seqadv.csv')
print (df)
df = pd.DataFrame(data={'WT_RESIDUE':['']})
codes = {'ALA':'A', 'ARG':'R', 'ASN':'N', 'ASP':'D', 'CYS':'C', 'GLU':'E', 'GLN':'Q', 'GLY':'G', 'HIS':'H', 'ILE':'I', 'LEU':'L', 'LYS':'K', 'MET':'M', 'PHE':'F', 'PRO':'P', 'SER':'S', 'THR':'T', 'TRP':'W', 'TYR':'Y', 'VAL':'V'}
df['MUTATION_CODE'] = df['WT_RESIDUE'].replace(codes)
df.to_csv (r'C:\Users\User\Documents\Research\output.csv')
I tried this, but it will not create a new column no matter what I do.
example
It seems like you made a silly mistake
import pandas as pd
df = pd.read_csv (r'C:\Users\User\Documents\Research\seqadv.csv')
print (df)
df = pd.DataFrame(data={'WT_RESIDUE':['']}) # Why do you have this line?
codes = {'ALA':'A', 'ARG':'R', 'ASN':'N', 'ASP':'D', 'CYS':'C', 'GLU':'E', 'GLN':'Q', 'GLY':'G', 'HIS':'H', 'ILE':'I', 'LEU':'L', 'LYS':'K', 'MET':'M', 'PHE':'F', 'PRO':'P', 'SER':'S', 'THR':'T', 'TRP':'W', 'TYR':'Y', 'VAL':'V'}
df['MUTATION_CODE'] = df['WT_RESIDUE'].replace(codes)
df.to_csv (r'C:\Users\User\Documents\Research\output.csv')
Try removing the line with the comment. AFAIK, it is reinitializing your DataFrame and thus the WT_RESIDUE column becomes empty.
Considering sample from provided input.
We can use map function to map the keys of dict to existing column and persist corresponding values in new column.
df = pd.DataFrame({
'WT_RESIDUE':['ALA', "REMARK", 'VAL', "LYS"]
})
codes = {'ALA':'A', 'ARG':'R', 'ASN':'N', 'ASP':'D', 'CYS':'C', 'GLU':'E', 'GLN':'Q', 'GLY':'G', 'HIS':'H', 'ILE':'I', 'LEU':'L', 'LYS':'K', 'MET':'M', 'PHE':'F', 'PRO':'P', 'SER':'S', 'THR':'T', 'TRP':'W', 'TYR':'Y', 'VAL':'V'}
df['MUTATION_CODE'] = df.WT_RESIDUE.map(codes)
Input
WT_RESIDUE
0 ALA
1 REMARK
2 VAL
3 LYS
Output
WT_RESIDUE MUTATION_CODE
0 ALA A
1 REMARK NaN
2 VAL V
3 LYS K
I am trying to have Python Pandas compare two dataframes with each other. In dataframe 1, i have two columns (AC-Cat and Origin). I am trying to compare the AC-Cat column with the inputs of Dataframe 2. If a match is found between one of the columns of Dataframe 2 and the value of dataframe 1 being studied, i want Pandas to copy the header of the column of Dataframe 2 in which the match is found to a new column in Dataframe 1.
DF1:
f = {'AC-Cat': pd.Series(['B737', 'A320', 'MD11']),
'Origin': pd.Series(['AJD', 'JFK', 'LRO'])}
Flight_df = pd.DataFrame(f)
DF2:
w = {'CAT-C': pd.Series(['DC85', 'IL76', 'MD11', 'TU22', 'TU95']),
'CAT-D': pd.Series(['A320', 'A321', 'AN12', 'B736', 'B737'])}
WCat_df = pd.DataFrame(w)
I imported pandas as pd and numpy as np and tried to define a function to compare these columns.
def get_wake_cat(AC_cat):
try:
Wcat = [WCat_df.columns.values[0]][WCat_df.iloc[:,1]==AC_cat].values[0]
except:
Wcat = np.NAN
return Wcat
Flight_df.loc[:,'CAT'] = Flight_df.loc[:,'AC-Cat'].apply(lambda CT: get_wake_cat(CT))
However, the function does not result in the desired outputs. For example: Take the B737 AC-Cat value. I want Python Pandas to then find this value in DF2 in the column CAT-D and copy this header to the new column of DF 1. This does not happen. Can someone help me find out why my code is not giving the desired results?
Not pretty but I think I got it working. Part of the error was that the function did not have WCat_df. I also changed the indexing into two steps:
def get_wake_cat(AC_cat, WCat_df):
try:
d=WCat_df[WCat_df.columns.values][WCat_df.iloc[:]==AC_cat]
Wcat=d.columns[(d==AC_cat).any()][0]
except:
Wcat = np.NAN
return Wcat
Then you need to change your next line to:
Flight_df.loc[:,'CAT'] = Flight_df.loc[:,'AC-Cat'].apply(lambda CT: get_wake_cat(CT,WCat_df ))
AC-Cat Origin CAT
0 B737 AJD CAT-D
1 A320 JFK CAT-D
2 MD11 LRO CAT-C
Hope that solves the problem
This will give you 2 new columns with the name\s of the match\s found:
Flight_df['CAT1'] = Flight_df['AC-Cat'].map(lambda x: 'CAT-C' if x in list(WCat_df['CAT-C']) else '')
Flight_df['CAT2'] = Flight_df['AC-Cat'].map(lambda x: 'CAT-D' if x in list(WCat_df['CAT-D']) else '')
Flight_df.loc[Flight_df['CAT1'] == '', 'CAT1'] = Flight_df['CAT2']
Flight_df.loc[Flight_df['CAT1'] == Flight_df['CAT2'], 'CAT2'] = ''
IUC, you can do a stack and merge:
final=(Flight_df.merge(WCat_df.stack().reset_index(1,name='AC-Cat'),on='AC-Cat',how='left')
.rename(columns={'level_1':'New'}))
print(final)
Or with melt:
final=Flight_df.merge(WCat_df.melt(var_name='New',value_name='AC-Cat'),
on='AC-Cat',how='left')
AC-Cat Origin New
0 B737 AJD CAT-D
1 A320 JFK CAT-D
2 MD11 LRO CAT-C
I'm in the initial stages of doing some 'machine learning'.
I'm trying to create a new data frame and one of the columns doesn't appear to be recognised..?
I've loaded an Excel file with 2 columns (removed the index). All fine.
Code:
df = pd.read_excel('scores.xlsx',index=False)
df=df.rename(columns=dict(zip(df.columns,['Date','Amount'])))
df.index=df['Date']
df=df[['Amount']]
#creating dataframe
data = df.sort_index(ascending=True, axis=0)
new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date','Amount'])
for i in range(0,len(data)):
new_data['Date'][i] = data['Date'][i]
new_data['Amount'][i] = data['Amount'][i]
The error:
KeyError: 'Date'
Not really sure what's the problem here.
Any help greatly appreciated
I think in line 4 you reduce your dataframe to just one column "Amount"
To add to #Grzegorz Skibinski's answer, the problem is after line 4, there is no longer a 'Date' column. The Date column was assigned to the index and removed, and while the index has a name "Date", you can't use 'Date' as a key to get the index - you have to use data.index[i] instead of data['Date'][i].
It seems that you have an error in the formatting of your Date column.
To check that you don't have an error on the name of the columns you can print the columns names:
import pandas as pd
# create data
data_dict = {}
data_dict['Fruit '] = ['Apple', 'Orange']
data_dict['Price'] = [1.5, 3.24]
# create dataframe from dict
df = pd.DataFrame.from_dict(data_dict)
# Print columns names
print(df.columns.values)
# Print "Fruit " column
print(df['Fruit '])
This code outputs:
['Fruit ' 'Price']
0 Apple
1 Orange
Name: Fruit , dtype: object
We clearly see that the "Fruit " column as a trailing space. This is an easy mistake to do, especially when using excel.
If you try to call "Fruit" instead of "Fruit " you obtain the error you have:
KeyError: 'Fruit'
I'm trying to create a columns called 'city_code' with values from the 'code' column. But in order to do this I need to compare if 'ds_city' and 'city' values are equal.
Here is a table sample:
https://i.imgur.com/093GJF1.png
I've tried this:
def find_code(data):
if data['ds_city'] == data['city'] :
return data['code']
else:
return 'UNKNOWN'
df['code_city'] = df.apply(find_code, axis=1)
But since there are duplicates in the 'ds_city' columns that's the result:
https://i.imgur.com/geHyVUA.png
Here is a image of the expected result:
https://i.imgur.com/HqxMJ5z.png
How can I work around this?
You can use pandas merge:
df = pd.merge(df, df[['code', 'city']], how='left',
left_on='ds_city', right_on='city',
suffixes=('', '_right')).drop(columns='city_right')
# output:
# code city ds_city code_right
# 0 1500107 ABAETETUBA ABAETETUBA 1500107
# 1 2900207 ABARE ABAETETUBA 1500107
# 2 2100055 ACAILANDIA ABAETETUBA 1500107
# 3 2300309 ACOPIARA ABAETETUBA 1500107
# 4 5200134 ACREUNA ABARE 2900207
Here's pandas.merge's documentation. It takes the input dataframe and left joins itself's code and city columns when ds_city equals city.
The above code will fill code_right when city is not found with nan. You can further do the following to fill it with 'UNKNOWN':
df['code_right'] = df['code_right'].fillna('UNKNOWN')
This is more like np.where
import numpy as np
df['code_city'] = np.where(data['ds_city'] == data['city'],data['code'],'UNKNOWN')
You could try this out:
# Begin with a column of only 'UNKNOWN' values.
data['code_city'] = "UNKNOWN"
# Iterate through the cities in the ds_city column.
for i, lookup_city in enumerate(data['ds_city']):
# Note the row which contains the corresponding city name in the city column.
row = data['city'].tolist().index(lookup_city)
# Reassign the current row's code_city column to that code from the row we found in the last step.
data['code_city'][i] = data['code'][row]
I am a beginner in Python and getting an error while trying to drop values from a column in pandas dataframe. I keep getting Keyerror after sometime. Here is the code snippet:
for i in data['FilePath'].keys():
if '.' not in data['FilePath'][i]:
value = data['FilePath'][i]
data = data[data['FilePath'] != value]
I keep getting Keyerror near the line "if '.' not in data['FilePath'][i]". Please help me fix this error
If I understand your logic correctly, then you should be be able to do this without a loop. From what I can see, it looks like you want to drop rows if the FilePath column does not begin with .. If this is correct, then below is one way to do this:
Create sample data using nested list
d = [
['BytesAccessed','FilePath','DateTime'],
[0, '/lib/x86_64-linux-gnu/libtinfo.so.5 832.0', '[28/Jun/2018:11:53:09]'],
[1, './lib/x86-linux-gnu/yourtext.so.6 932.0', '[28/Jun/2018:11:53:09]'],
[2, '/lib/x86_64-linux-gnu/mytes0', '[28/Jun/2018:11:53:09]'],
]
data = pd.DataFrame(d[1:], columns=d[0])
print(data)
BytesAccessed FilePath DateTime
0 0 /lib/x86_64-linux-gnu/libtinfo.so.5 832.0 [28/Jun/2018:11:53:09]
1 1 ./lib/x86-linux-gnu/yourtext.so.6 932.0 [28/Jun/2018:11:53:09]
2 2 /lib/x86_64-linux-gnu/mytes0 [28/Jun/2018:11:53:09]
Filtered data to drop rows that do not contain . at any location in the FilePath column
data_filtered = (data.set_index('FilePath')
.filter(like='.', axis=0)
.reset_index())[data.columns]
print(data_filtered)
BytesAccessed FilePath DateTime
0 0 /lib/x86_64-linux-gnu/libtinfo.so.5 832.0 [28/Jun/2018:11:53:09]
1 1 ./lib/x86-linux-gnu/yourtext.so.6 932.0 [28/Jun/2018:11:53:09]