Getting KeyError using Pandas when accessing .csv files - python

For some reason pandas is throwing an error when looking through some .csv stock data I have. Here is the error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Date'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./python-for-finance-7.py", line 75, in
compile_data()
File "./python-for-finance-7.py", line 59, in compile_data
df.set_index('Date', inplace=True)
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", >line 3909, in set_index
level = frame[col]._values
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", >line 2688, in getitem
return self._getitem_column(key)
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", >line 2695, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python3.7/site->packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc> return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Date'
to this code:
import bs4 as bs
import datetime as dt
import os
import pandas as pd
import pandas_datareader.data as web
import pickle
import requests
def compile_data():
with open("sp500tickers.pickle","rb") as f:
tickers = pickle.load(f)
main_df = pd.DataFrame()
for count,ticker in enumerate(tickers):
df = pd.read_csv('stock_dfs/{}.csv'.format(ticker),
delimiter=',', encoding="utf-8-sig")
df.set_index('Date', inplace=True)
df.rename(columns = {'Adj Close':ticker}, inplace=True)
df.drop(['High','Low','Open','Close','Volume'], 1, inplace=True)
if main_df.empty:
main_df = df
else:
main_df = main_df.join(df, how='outer')
print(count)
print(main_df.head())
main_df.to_csv('sp500_joined_closes.csv')
compile_data()
The data in the CSV files is arranged like this:
Date High Low Open Close Volume Adj. Close
yyyy-mm-dd $$ $$ $$ $$ $$ $$
I tried changing the casing of Date (ie changing Date to date) but it just moves on to throw another
KeyError:"['High', 'Low', 'Open', 'Close', 'Volume'] not found in axis
Can someone please help??

It looks like you're using the wrong delineator. The file is white-space delineated, not comma delineated.
Try using a whitespace delineator:
df = pd.read_csv('stock_dfs/{}.csv'.format(ticker),
delimiter=r'\s+', encoding="utf-8-sig")

In my case, I didn't have any entries when setting the index, the data frame was empty.
It's worth checking
if len(df) > 0:
before setting the index

Related

Pandas read_csv() parses dates fine but can't index by date

This is strange.
Data (csv):
Date, Hr 1,Hr 2,Hr 3,..
20070701,1128,1072,1173,..
20070702,1131,1092,1287,..
Pretty vanilla use of pd.read_csv():
df = pd.read_csv( filename,
parse_dates=['Date'],
index_col=['Date'])
Date seems to parse fine into the index:
print(df.index[:2])
Output:
DatetimeIndex(['2007-07-01', '2007-07-02'], dtype='datetime64[ns]', name='Date', freq=None)
Now if I try to index a single day?
print(df['2007-7-1']) # or any variation on "2007-07-01" etc
Output:
Traceback (most recent call last):
File "/Users/mjw/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '2007-7-1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "my_file.py", line 108, in <module>
print(df['2007-7-1'])
File "/Users/mjw/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "/Users/mjw/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '2007-7-1'
I've also tried to make sure the DatetimeIndex freq is set right
df = df.asfreq('d')
And I get the same junk.
But indexing by year and month works fine, or indexing by year-month-day after selecting a column:
print(df['2007-7']) # works
print(df['Hr 1']['2007-7-1']) # works
But this does not:
print(df['2007-7-1']['Hr 1'])
I can make a custom date parser but the point is that I shouldn't have to do that. "yyyymmdd" isn't exactly hard or unusual. Come on pandas.
Please and thank you!
Use .loc:
print(df.loc["2007-07-01"])
Prints:
Hr 1 1128
Hr 2 1072
Hr 3 1173
Name: 2007-07-01 00:00:00, dtype: int64
For just value of "Hr 2" column:
print(df.loc["2007-07-01", "Hr 2"])
Prints:
1072

Read csv; replace values and save on csv

I have a csv file and I want to modify the first column by removing all "-".
After that, I want to save the changes in that same first column.
import pandas as pd
clean_order = pd.read_csv('C:/Users/(...)/Page_Clean_test.csv', 'w+', delimiter=';', skiprows=0, low_memory=False)
clean_order.loc[clean_order['web_scraper_order'].fillna('').str.replace('-', ''), 'web_scraper_order']
clean_order.to_csv('C:/Users/(...)/Page_Clean_test.csv', index=False)
Error:
File "C:\Users\suiso\PycharmProjects\Teste_SA\venv\lib\site-packages\pandas\core\indexes\base.py", line 2889, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'web_scraper_order'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/Users/suiso/PycharmProjects/Teste_SA/Clean Data/Dataframe_comments.py", line 21, in <module>
clean_order.loc[clean_order['web_scraper_order'].fillna('').str.replace('-', ''), 'web_scraper_order']
File "C:\Users\suiso\PycharmProjects\Teste_SA\venv\lib\site-packages\pandas\core\frame.py", line 2899, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\suiso\PycharmProjects\Teste_SA\venv\lib\site-packages\pandas\core\indexes\base.py", line 2891, in get_loc
raise KeyError(key) from err
KeyError: 'web_scraper_order'
Try changing:
clean_order.loc[clean_order['web_scraper_order'].fillna('').str.replace('-', ''), 'web_scraper_order']
To:
clean_order = clean_order[clean_order.loc['web_scraper_order'].fillna('').str.replace('-', '')]
you may use
clean_order['web_scraper_order']=clean_order['web_scraper_order'].str.replace('-','')
clean_order.to_csv('filename.csv',index=Flase)

Get excel column into a variable

I want to move some xls data into json. I can't just use a ready solution, since this is a bit of a special case.
Here's the excel
Here's the code:
import pandas
xl = pandas.ExcelFile("./data/file.xlsx")
df = xl.parse("2")
x = df["XX"][0]
print(x)
# writing to file
text_file = open("json_files/Output.json", "w")
# text_file.write(json_str)
text_file.close()
Here's the error I'm getting:
Traceback (most recent call last):
File "C:\Users\aironsid\Documents\Capgemini\Excel_to_Json\venv\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'XX'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "excelToJson.py", line 5, in <module>
x = df["XX"][0]
File "C:\Users\aironsid\Documents\Capgemini\Excel_to_Json\venv\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\aironsid\Documents\Capgemini\Excel_to_Json\venv\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'XX'
It seems to not be able to find the column name.
I'm using this video as reference
import pandas
xl = pandas.ExcelFile("file.xlsx")
# df = xl.parse("Text")
# print(df.columns)
# # x = df["XX"][0]
# # print(x)
df = pandas.Dataframe(xl)
print(df.columns)
# if you can see the columns
print(df["XX"])
# if this is success
dictionary = {"XX": list(df["XX"])}
# writing to file
text_file = open("json_files/Output.json", "w")
# text_file.write(json_str)
text_file.close()
please try this
df = pd.Dataframe(xl)
print(df.columns)
# if you can see the columns
print(df["XX"])
# if this is success
dictionary = {"XX": list(df["XX"])}
As mentioned in comments, you need to translate the starting point of A1 to B7 in your case. This can be achieved with the "skiprows" parameter of pandas.ExcelFile.parse and the index_col parameter:
import pandas
xl = pandas.ExcelFile("path\to\your\file.xlsx")
df = xl.parse("YourSheetName",index_col=1,skiprows=7)
For more documentation/parameters see pandas docs

Python Pandas: Why can't I convert 'Time' to to_datetime? Will not recognize time

Time data looks like this: Time
20:15:00.0
20:16:00.0
20:17:00.0
20:18:00.0
20:19:00.0
20:20:00.0
20:21:00.0
20:22:00.0
20:23:00.0
20:24:00.0
data: https://imgur.com/a/LQIjHGt)
Python recognizes these as:
Date object
**Time** **object**
Open float64
High float64
Low float64
Last float64
I've tried to import data like this:
hour
df = pd.read_csv('ES_1min_2012_vwap_va.txt', sep=",", nrows=1000, parse_dates=True);
df['Time'] = pd.to_datetime(df['Time'])
**ERROR**:
runfile('C:/Users/user/Desktop/Trading/Main/historical data/Index/ES/Intraday Volatility by VIX.py', wdir='C:/Users/user/Desktop/Trading/Main/historical data/Index/ES')
Traceback (most recent call last):
File "C:\Users\user\miniconda3\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Time'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\user\Desktop\Trading\Main\historical data\Index\ES\Intraday Volatility by VIX.py", line 18, in <module>
df['Time'] = pd.to_datetime(df['Time'], errors='ignore')
File "C:\Users\user\miniconda3\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\user\miniconda3\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Time'
Solved this error a month ago but completely forgot, pls help
I think there's a space in front of ' Time', you can use skipinitialspace=True:
df = pd.read_csv('test.csv', sep=',', nrows=1000, parse_dates=True, skipinitialspace=True)

How to resolve date time error in Pandas code?

I have a csv file that has 7 columns ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume']
The thing is I tried to set a datetime index but it does not work may be because date and time are two separate columns.
Here is the code:
import pandas as pd
column_names = ['Date', 'Time', 'Open', 'High', 'Low','Close', 'Volume']
df = pd.read_csv(r"E:\Tutorial\EURUSD60.csv", header=None, names=column_names)
df['DateTime'] = pd.to_datetime(df['Date', 'Time'])
print(df.head())
Here is the error:
C:\Users\sydgo\Anaconda3\python.exe E:/Tutorial/language.py Traceback
(most recent call last): File
"C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\base.py",
line 2442, in get_loc
return self._engine.get_loc(key) File "pandas_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File
"pandas_libs\index.pyx", line 154, in
pandas._libs.index.IndexEngine.get_loc File
"pandas_libs\hashtable_class_helper.pxi", line 1210, in
pandas._libs.hashtable.PyObjectHashTable.get_item File
"pandas_libs\hashtable_class_helper.pxi", line 1218, in
pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: ('Date',
'Time')
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "E:/Tutorial/language.py",
line 7, in
df['DateTime'] = pd.to_datetime(df['Date', 'Time']) File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\frame.py",
line 1964, in getitem
return self._getitem_column(key) File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\frame.py",
line 1971, in _getitem_column
return self._get_item_cache(key) File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\generic.py",
line 1645, in _get_item_cache
values = self._data.get(item) File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\internals.py",
line 3590, in get
loc = self.items.get_loc(item) File "C:\Users\sydgo\Anaconda3\lib\site-packages\pandas\core\indexes\base.py",
line 2444, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas_libs\index.pyx", line 132, in
pandas._libs.index.IndexEngine.get_loc File
"pandas_libs\index.pyx", line 154, in
pandas._libs.index.IndexEngine.get_loc File
"pandas_libs\hashtable_class_helper.pxi", line 1210, in
pandas._libs.hashtable.PyObjectHashTable.get_item File
"pandas_libs\hashtable_class_helper.pxi", line 1218, in
pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: ('Date',
'Time')
If you simplify your code, you'll see the error is right here:
df['Date', 'Time']
That's because you are indexing into the DataFrame once by two strings, but you want to index into it twice, by each of two strings. That is:
df[['Date', 'Time']]
Still, this may fail, because to_datetime expects strings, not pairs of strings:
pd.to_datetime(df['Date', 'Time'])
In which case try this:
pd.to_datetime(df.Date + ' ' + df.Time)

Categories