How to set date column as an index? I'm getting an error
AttributeError: 'DataFrame' object has no attribute 'Date'
How to fix this?
The Date column is already the index column, isn't it?
You can reset the index column and set it again like this if you want to try.
You will get the same result.
However, if you want to modify your Date column, you can do it by resetting the index column, modifying it, and then setting it back to the index.
import pandas as pd
import pandas_datareader as web
df = web.DataReader('^BSESN', data_source='yahoo', start='2015-07-16', end='2020-07-16')
df.reset_index(level=0, inplace=True)
# If you want to modify your index column, you can do it here.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
df.index = df['Date']
df.drop('Date', axis=1, inplace=True)
df
Looks like you already have Date as index.
To set any column as index, you can also try:
df = df.set_index('Date')
This will set your Date column as index as well save your current index into DataFrame and will also make sure that there is no replica of Date present in the DataFrame.
Related
I'm attempting to create a date index on a dataframe from a copy of another dataframe using the unique values. My problem is that the index wont' allow me to set the index name to expiration date, because it's not recognizing the key
import pandas as pd
import requests
raw_data = requests.get(f"https://cdn.cboe.com/api/global/delayed_quotes/options/SPY.json")
dict_data = pd.DataFrame.from_dict(raw_data.json())
spot_price = dict_data.loc["current_price", "data"]
#create dataframe from options key
data = pd.DataFrame(dict_data.loc["options", "data"])
data['expiration_date'] = str(20) + data['option'].str.extract((r"[A-Z](\d+)")).astype(str)
data["expiration_date"] = pd.to_datetime(data["expiration_date"], format="%Y-%m-%d")
# create date dataframe
date_df = pd.DataFrame(data["expiration_date"].unique())
date_df.index = pd.to_datetime(date_df.index)
date_df.set_index('expiration_date', inplace=True)
print(date_df.index)
print(date_df.index.name)
print(date_df)
This gives me the error: KeyError: "None of ['expiration_date'] are in the columns"
I'm able to get close if I use: date_df.index = pd.to_datetime(date_df.index)
however, I get a strange format for my key, it turns to '1970-01-01 00:00:00.000000000 2022-09-21'
I've tried adding , format="%Y-%m-%d", but it doesn't change the format.
If I use date_df.index = pd.to_datetime(date_df.index).strftime("%Y-%m-%d") it does fix the date format, but I'm still left with 1970-01-01 and my index_name is still none.
Using date_df.index.names = ['expiration_date'] will let me change the index name to expiration-date, but my index is still 0 and it adds a column for the date 1970, which I dont' want.
0
expiration_date
1970-01-01 2022-09-21
Now if I try to set the index I'm still greeted with none of expiration_date are in the columns.
As you can see I'm all over the place, what is the correct way to assign an index for dataframe on a date field?
The commented code is where I'm stuck:
date_df = pd.DataFrame(data["expiration_date"].unique())
date_df.index.names = ['expiration_date']
date_df.index = pd.to_datetime(date_df.index).strftime("%Y-%m-%d")
# date_df.set_index('expiration_date', inplace=True)
print(date_df.index.name)
print(date_df)
If you want to create a DataFrame, which is a copy of your first "data" DataFrame, with unique values of the 'expiration_date' column, and set its index as this column you can use this code:
# copy data DataFrame and set its index as expiration_date
date_df = data.set_index("expiration_date")
# drop duplicated index
date_df=date_df[~date_df.index.duplicated(keep='first')]
Issue with your existing code is related to this line date_df = pd.DataFrame(data["expiration_date"].unique()). This line creates DataFrame indexed from 0 to length, and its first column called "0" that gets your unique values. If this is what you want you can change this line like:
date_df = pd.DataFrame(data["expiration_date"].unique(),columns=["expiration_date"])
date_df.set_index('expiration_date', inplace=True)
I am using the django_pandas package to obtain a Pandas dataframe from stored Django models.
df = qs.to_dataframe(['time', 'price_open', 'price_high', 'price_low', 'price_close'], index='time')
Now I want to access the dataframe by a datetime, however, this does not work. Printing df looks like this, where the time column has a weird position:
If I do print(df.keys()) I get the following results: Index(['price_open', 'price_high', 'price_low', 'price_close', ], dtype='object') but I more expected time. Also, df.columns does not contain time but only the other columns.
How can I access df by a given datetime? Why is time not the key for df? Thanks!
As pointed out by #ThePorcius, reset_index should give you back the time column.
df = df.reset_index()
According to the docs, you can use on argument in resample to use a column instead of index.
You'll need to make sure that time column is a datetime.
dailyFrame=(
df.resample('D', on='time')
.agg({'price_open': 'first', 'price_high': 'max', 'price_low': 'min', 'price_close': 'last'})
)
enter image description herePlease i am trying to name the index column but I can't. I want to be a able to name it such that I can reference it to view the index values which are dates. i have tried
df3.rename(columns={0:'Date'}, inplace=True) but it's not working.
Please can someone help me out? Thank you.
Note that the dataframe index cannot be accessed using df['Date'],
I fyou want rename the index, you can use DataFrame.rename_axis:
df=df.rename_axis(index='Date')
if you want to access it as a column then you have to transform it into a column using:
df=df.reset_index()
then you can use:
df['Date']
otherwise you can access the index by:
df.index
As there is no example data frame that you are on, I am listing an arbitrary example to demonstrate the idea.
import datetime as dt
import pandas as pd
data = {'C1' : [1, 2],
'Date' : [dt.datetime.now().strftime('%Y-%m-%d'),dt.datetime.now().strftime('%Y-%m-%d')]}
df = pd.DataFrame(data)
df.index = df["Date"]
del df["Date"]
print(df.index.name) # this will give you the new index column
print(df) #print the dataframe
Let's say I have a data frame like this.
Max Min Open OpenA
Date
2017.10.18 1.18050 1.17858 1.17872 1.18028
2017.10.19 1.18575 1.17676 1.17804 1.18565
2017.10.20 1.18575 1.17621 1.17642 1.18532
2017.10.23 1.17770 1.17245 1.17281 1.17763
2017.10.24 1.17924 1.17423 1.17430 1.17866
And i want to refer to the data['Date'] column. But i get this error:
KeyError: 'Date'
Cheers!
You can use reset_index and then treat it as a column:
df = df.reset_index()
df['date']
OR
you can use df.index.tolist(). This will return you the values.
Ex:
In [2918]: df
Out[2918]:
emp_id
date
10/1/2018 staff_1
10/1/2018 staff_2
10/1/2018 staff_3
In [2922]: df.index.tolist()
Out[2922]: ['10/1/2018', '10/1/2018', '10/1/2018']
OR
In [2924]: df = df.reset_index()
In [2926]: df['date']
Out[2926]:
0 10/1/2018
1 10/1/2018
2 10/1/2018
That isn't actually a column, rather index. So use data.index to fetch the values without changing current structure of the dataframe.
You can further use data.reset_index() to make it a column.
Note - Don't use data.reset_index(drop=True) as that will drop the current index without even making it a column.
Trying to assign a date to a column in a DataFrame.
Assigning in the following way gives an error
for date in sorted(list(set(dates))):
df.loc[:, 'DATE'] = date
Error Cannot set a frame with no defined index and a scalar
Okay, fine:
for date in sorted(list(set(dates))):
df['DATE'] = date
Warning: A value is truing to be set on a copy of a slice from a DataFrame, try using .loc ...
What is it exactly that python prefers I do to not avoid an Error with a Warning instead?
Many thanks!
if you are sure that len(sorted(list(set(dates)))) == len(df) then you can simply do:
df['DATE'] = sorted(list(set(dates)))