I am using the django_pandas package to obtain a Pandas dataframe from stored Django models.
df = qs.to_dataframe(['time', 'price_open', 'price_high', 'price_low', 'price_close'], index='time')
Now I want to access the dataframe by a datetime, however, this does not work. Printing df looks like this, where the time column has a weird position:
If I do print(df.keys()) I get the following results: Index(['price_open', 'price_high', 'price_low', 'price_close', ], dtype='object') but I more expected time. Also, df.columns does not contain time but only the other columns.
How can I access df by a given datetime? Why is time not the key for df? Thanks!
As pointed out by #ThePorcius, reset_index should give you back the time column.
df = df.reset_index()
According to the docs, you can use on argument in resample to use a column instead of index.
You'll need to make sure that time column is a datetime.
dailyFrame=(
df.resample('D', on='time')
.agg({'price_open': 'first', 'price_high': 'max', 'price_low': 'min', 'price_close': 'last'})
)
Related
Using python pandas how can we change the data frame
First, how to copy the column name down to other cell(blue)
Second, delete the row and index column(orange)
Third, modify the date formate(green)
I would appreciate any feedback~~
Update
df.iloc[1,1] = df.columns[0]
df = df.iloc[1:].reset_index(drop=True)
df.columns = df.iloc[0]
df = df.drop(df.index[0])
df = df.set_index('Date')
print(df.columns)
Question 1 - How to copy column name to a column (Edit- Rename column)
To rename a column pandas.DataFrame.rename
df.columns = ['Date','Asia Pacific Equity Fund']
# Here the list size should be 2 because you have 2 columns
# Rename using pandas pandas.DataFrame.rename
df.rename(columns = {'Asia Pacific Equity Fund':'Date',"Unnamed: 1":"Asia Pacific Equity Fund"}, inplace = True)
df.columns will return all the columns of dataframe where you can access each column name with index
Please refer Rename unnamed column pandas dataframe to change unnamed columns
Question 2 - Delete a row
# Get rows from first index
df = df.iloc[1:].reset_index()
# To remove desired rows
df.drop([0,1]).reset_index()
Question 3 - Modify the date format
current_format = '%Y-%m-%d %H:%M:%S'
desired_format = "%Y-%m-%d"
df['Date'] = pd.to_datetime(df['Date']).dt.strftime(desired_format)
# Input the existing format
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=current_format).dt.strftime(desired_format)
# To update date format of Index
df.index = pd.to_datetime(df.index,infer_datetime_format=current_format).strftime(desired_format)
Please refer pandas.to_datetime for more details
I'm not sure I understand your questions. I mean, do you actually want to change the dataframe or how it is printed/displayed?
Indexes can be changed by using methods .set_index() or .reset_index(), or can be dropped eventually. If you just want to remove the first digit from each index (that's what I understood from the orange column), you should then create a list with the new indexes and pass it as a column to your dataframe.
Regarding the date format, it depends on what you want the changed format to become. Take a look into python datetime.
I would strongly suggest you to take a better look into pandas features and documentations, and how to handle a dataframe with this library. There is plenty of great sources a Google-search away :)
Delete the first two rows using this.
Rename the second column using this.
Work with datetime format using the datetime package. Read about it here
How to set date column as an index? I'm getting an error
AttributeError: 'DataFrame' object has no attribute 'Date'
How to fix this?
The Date column is already the index column, isn't it?
You can reset the index column and set it again like this if you want to try.
You will get the same result.
However, if you want to modify your Date column, you can do it by resetting the index column, modifying it, and then setting it back to the index.
import pandas as pd
import pandas_datareader as web
df = web.DataReader('^BSESN', data_source='yahoo', start='2015-07-16', end='2020-07-16')
df.reset_index(level=0, inplace=True)
# If you want to modify your index column, you can do it here.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
df.index = df['Date']
df.drop('Date', axis=1, inplace=True)
df
Looks like you already have Date as index.
To set any column as index, you can also try:
df = df.set_index('Date')
This will set your Date column as index as well save your current index into DataFrame and will also make sure that there is no replica of Date present in the DataFrame.
enter image description herePlease i am trying to name the index column but I can't. I want to be a able to name it such that I can reference it to view the index values which are dates. i have tried
df3.rename(columns={0:'Date'}, inplace=True) but it's not working.
Please can someone help me out? Thank you.
Note that the dataframe index cannot be accessed using df['Date'],
I fyou want rename the index, you can use DataFrame.rename_axis:
df=df.rename_axis(index='Date')
if you want to access it as a column then you have to transform it into a column using:
df=df.reset_index()
then you can use:
df['Date']
otherwise you can access the index by:
df.index
As there is no example data frame that you are on, I am listing an arbitrary example to demonstrate the idea.
import datetime as dt
import pandas as pd
data = {'C1' : [1, 2],
'Date' : [dt.datetime.now().strftime('%Y-%m-%d'),dt.datetime.now().strftime('%Y-%m-%d')]}
df = pd.DataFrame(data)
df.index = df["Date"]
del df["Date"]
print(df.index.name) # this will give you the new index column
print(df) #print the dataframe
I have several dataframes that have mixed in some columns with dates in this ASP.NET format "/Date(1239018869048)/". I've figured out how to parse this into python's datetime format for a given column. However I would like to put this logic into a function so that I can pass it any dataframe and have it replace all the dates that it finds that match a regex using pd.Dataframe.replace.
something like:
def pretty_dates():
#Messy logic here
df.replace(to_replace=r'\/Date(d+)', value=pretty_dates(df), regex=True)
Problem with this is that the df that is being passed to pretty_dates is the whole dataframe not just the cell that is needed to be replaced.
So the concept I'm trying to figure out is if there is a way that the value that should be replaced when using df.replace can be a function instead of a static value.
Thank you so much in advance
EDIT
To try to add some clarity, I have many columns in a dataframe, over a hundred that contain this date format. I would like not to list out every single column that has a date. Is there a way to apply the function the clean my dates across all the columns in my dataset? So I do not want to clean 1 column but all the hundreds of columns of my dataframe.
I'm sure you can use regex to do this in one step, but here is how to apply it to the whole column at once:
df = pd.Series(['/Date(1239018869048)/',
'/Date(1239018869048)/'],dtype=str)
df = df.str.replace('\/Date\(', '')
df = df.str.replace('\)\/', '')
print(df)
0 1239018869048
1 1239018869048
dtype: object
As far as I understand, you need to apply custom function to selected cells in specified column. Hope, that the following example helps you:
import pandas as pd
df = pd.DataFrame({'x': ['one', 'two', 'three']})
selection = df.x.str.contains('t', regex=True) # put your regexp here
df.loc[selection, 'x'] = df.loc[selection, 'x'].map(lambda x: x+x) # do some logic instead
You can apply this procedure to all columns of the df in a loop:
for col in df.columns:
selection = df.loc[:, col].str.contains('t', regex=True) # put your regexp here
df.loc[selection, col] = df.loc[selection, col].map(lambda x: x+x) # do some logic instead
I need to filter out data with specific hours. The DataFrame function between_time seems to be the proper way to do that, however, it only works on the index column of the dataframe; but I need to have the data in the original format (e.g. pivot tables will expect the datetime column to be with the proper name, not as the index).
This means that each filter looks something like this:
df.set_index(keys='my_datetime_field').between_time('8:00','21:00').reset_index()
Which implies that there are two reindexing operations every time such a filter is run.
Is this a good practice or is there a more appropriate way to do the same thing?
Create a DatetimeIndex, but store it in a variable, not the DataFrame.
Then call it's indexer_between_time method. This returns an integer array which can then be used to select rows from df using iloc:
import pandas as pd
import numpy as np
N = 100
df = pd.DataFrame(
{'date': pd.date_range('2000-1-1', periods=N, freq='H'),
'value': np.random.random(N)})
index = pd.DatetimeIndex(df['date'])
df.iloc[index.indexer_between_time('8:00','21:00')]