Trying to assign a date to a column in a DataFrame.
Assigning in the following way gives an error
for date in sorted(list(set(dates))):
df.loc[:, 'DATE'] = date
Error Cannot set a frame with no defined index and a scalar
Okay, fine:
for date in sorted(list(set(dates))):
df['DATE'] = date
Warning: A value is truing to be set on a copy of a slice from a DataFrame, try using .loc ...
What is it exactly that python prefers I do to not avoid an Error with a Warning instead?
Many thanks!
if you are sure that len(sorted(list(set(dates)))) == len(df) then you can simply do:
df['DATE'] = sorted(list(set(dates)))
Related
I have a Pandas DataFrame whose rows and columns are a DatetimeIndex.
import pandas as pd
data = pd.DataFrame(
{
"PERIOD_END_DATE": pd.date_range(start="2018-01", end="2018-04", freq="M"),
"first": list("abc"),
"second": list("efg")
}
).set_index("PERIOD_END_DATE")
data.columns = pd.date_range(start="2018-01", end="2018-03", freq="M")
data
Unfortunately, I am getting a variety of errors when I try to pull out a value:
data['2018-01', '2018-02'] # InvalidIndexError: ('2018-01', '2018-02')
data['2018-01', ['2018-02']] # InvalidIndexError: ('2018-01', ['2018-02'])
data.loc['2018-01', '2018-02'] # TypeError: only integer scalar arrays can be converted to a scalar index
data.loc['2018-01', ['2018-02']] # KeyError: "None of [Index(['2018-02'], dtype='object')] are in the [columns]"
How do I extract a value from a DataFrame that uses a DatetimeIndex?
There are 2 issues:
Since, you are using a DateTimeIndex dataframe, the correct notation to traverse between rows and columns are:
a) data.loc[rows_index_name, [column__index_name]]
or
b) data.loc[rows_index_name, column__index_name]
depending on the type of output you desire.
Notation A will return a series value, while notation (b) returns a string value.
The index names can not be amputated- you must specify the whole string.
As such, your issue will be resolved with:
data.loc['2018-01-31',['2018-01-31']] or data.loc['2018-01-31','2018-01-31']
As long as you already set the date as index, you will not be able to slice or extract any data of it. You can extract the month and date of it as it is a regular column not when it is an index. I had this before and that was the solution.
I kept it as a regular column, extracted the Month, Day and Year as a seperate column for each of them, then I assigned the date column as the index column.
you are accessing as a period (YYYY-MM) on a date columns.
This would help in this case
data.columns = pd.period_range(start="2018-01", end="2018-02", freq='M')
data[['2018-01']]
2018-01
PERIOD_END_DATE
2018-01-31 a
2018-02-28 b
2018-03-31 c
Timestamp indexes are finicky. Pandas accepts each of the following expressions, but they return different types.
data.loc['2018-01',['2018-01-31']]
data.loc['2018-01-31',['2018-01-31']]
data.loc['2018-01','2018-01-31']
data.loc['2018-01-31','2018-01']
data.loc['2018-01-31','2018-01-31']
Using python pandas how can we change the data frame
First, how to copy the column name down to other cell(blue)
Second, delete the row and index column(orange)
Third, modify the date formate(green)
I would appreciate any feedback~~
Update
df.iloc[1,1] = df.columns[0]
df = df.iloc[1:].reset_index(drop=True)
df.columns = df.iloc[0]
df = df.drop(df.index[0])
df = df.set_index('Date')
print(df.columns)
Question 1 - How to copy column name to a column (Edit- Rename column)
To rename a column pandas.DataFrame.rename
df.columns = ['Date','Asia Pacific Equity Fund']
# Here the list size should be 2 because you have 2 columns
# Rename using pandas pandas.DataFrame.rename
df.rename(columns = {'Asia Pacific Equity Fund':'Date',"Unnamed: 1":"Asia Pacific Equity Fund"}, inplace = True)
df.columns will return all the columns of dataframe where you can access each column name with index
Please refer Rename unnamed column pandas dataframe to change unnamed columns
Question 2 - Delete a row
# Get rows from first index
df = df.iloc[1:].reset_index()
# To remove desired rows
df.drop([0,1]).reset_index()
Question 3 - Modify the date format
current_format = '%Y-%m-%d %H:%M:%S'
desired_format = "%Y-%m-%d"
df['Date'] = pd.to_datetime(df['Date']).dt.strftime(desired_format)
# Input the existing format
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=current_format).dt.strftime(desired_format)
# To update date format of Index
df.index = pd.to_datetime(df.index,infer_datetime_format=current_format).strftime(desired_format)
Please refer pandas.to_datetime for more details
I'm not sure I understand your questions. I mean, do you actually want to change the dataframe or how it is printed/displayed?
Indexes can be changed by using methods .set_index() or .reset_index(), or can be dropped eventually. If you just want to remove the first digit from each index (that's what I understood from the orange column), you should then create a list with the new indexes and pass it as a column to your dataframe.
Regarding the date format, it depends on what you want the changed format to become. Take a look into python datetime.
I would strongly suggest you to take a better look into pandas features and documentations, and how to handle a dataframe with this library. There is plenty of great sources a Google-search away :)
Delete the first two rows using this.
Rename the second column using this.
Work with datetime format using the datetime package. Read about it here
I am trying first to slice a some columns from original dataframe and then add the additional column 'INDEX' to the last column.
df = df.iloc[:, np.r_[10:17]] #col 0~6
df['INDEX'] = df.index #col 7
I have the error message of second line saying 'A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead'
Why am I seeing this and how should I solve it?
I would do
df.loc[:,'INDEX'] = df.index
by default Python does shallow copy of dataframe. So whatever operations are performed on dataframe, it will actually performed on originall data frame. and the message is exactly indicates that.
Either of below will make the Python interpreter happy 😃 :
df = df.iloc[:, np.r_[10:17]].copy()
or
df.loc[:, ['INDEX']] = df.index
I am using the django_pandas package to obtain a Pandas dataframe from stored Django models.
df = qs.to_dataframe(['time', 'price_open', 'price_high', 'price_low', 'price_close'], index='time')
Now I want to access the dataframe by a datetime, however, this does not work. Printing df looks like this, where the time column has a weird position:
If I do print(df.keys()) I get the following results: Index(['price_open', 'price_high', 'price_low', 'price_close', ], dtype='object') but I more expected time. Also, df.columns does not contain time but only the other columns.
How can I access df by a given datetime? Why is time not the key for df? Thanks!
As pointed out by #ThePorcius, reset_index should give you back the time column.
df = df.reset_index()
According to the docs, you can use on argument in resample to use a column instead of index.
You'll need to make sure that time column is a datetime.
dailyFrame=(
df.resample('D', on='time')
.agg({'price_open': 'first', 'price_high': 'max', 'price_low': 'min', 'price_close': 'last'})
)
How to set date column as an index? I'm getting an error
AttributeError: 'DataFrame' object has no attribute 'Date'
How to fix this?
The Date column is already the index column, isn't it?
You can reset the index column and set it again like this if you want to try.
You will get the same result.
However, if you want to modify your Date column, you can do it by resetting the index column, modifying it, and then setting it back to the index.
import pandas as pd
import pandas_datareader as web
df = web.DataReader('^BSESN', data_source='yahoo', start='2015-07-16', end='2020-07-16')
df.reset_index(level=0, inplace=True)
# If you want to modify your index column, you can do it here.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
df.index = df['Date']
df.drop('Date', axis=1, inplace=True)
df
Looks like you already have Date as index.
To set any column as index, you can also try:
df = df.set_index('Date')
This will set your Date column as index as well save your current index into DataFrame and will also make sure that there is no replica of Date present in the DataFrame.