Covert float into time in a pandas series - python

please see the data here: screenshot from Google Colab
I am trying to assign the time 19:00 (7pm) for all records of the column "Beginn_Zeit". For now I put the float 19.00. Now I need to convert it to a time format so that I can subsequently merge it with a date of the column "Beginn_Datum". Once I have this merged column, I need to paste its value to a all records with NaT of a different column "Delta2".
dfd['Beginn'] = pd.to_datetime(df['Beginn'], dayfirst=True)
dfd['Ende'] = pd.to_datetime(df['Ende'], dayfirst=True)
dfd['Delta2'] = dfd['Ende']-dfd['Beginn']
dfd.Ende.fillna(dfd.Beginn,inplace=True)
dfd['Beginn_Datum'] = dfd['Beginn'].dt.date
dfd["Beginn_Zeit"] = 19.00

Edited to better match your updated example.
from datetime import time, datetime
dfd['Beginn_Zeit'] = time(19,0)
# create new column combining date and time
new_col = dfd.apply(lambda row: datetime.combine(row['Beginn_Datum'], row['Beginn_Zeit']), axis=1)
# replace null values in Delta2 with new combined dates
dfd.loc[dfd['Delta2'].isnull(), 'Delta2'] = new_col

Related

Setting an index on a date from a copy of a dataframe to another?

I'm attempting to create a date index on a dataframe from a copy of another dataframe using the unique values. My problem is that the index wont' allow me to set the index name to expiration date, because it's not recognizing the key
import pandas as pd
import requests
raw_data = requests.get(f"https://cdn.cboe.com/api/global/delayed_quotes/options/SPY.json")
dict_data = pd.DataFrame.from_dict(raw_data.json())
spot_price = dict_data.loc["current_price", "data"]
#create dataframe from options key
data = pd.DataFrame(dict_data.loc["options", "data"])
data['expiration_date'] = str(20) + data['option'].str.extract((r"[A-Z](\d+)")).astype(str)
data["expiration_date"] = pd.to_datetime(data["expiration_date"], format="%Y-%m-%d")
# create date dataframe
date_df = pd.DataFrame(data["expiration_date"].unique())
date_df.index = pd.to_datetime(date_df.index)
date_df.set_index('expiration_date', inplace=True)
print(date_df.index)
print(date_df.index.name)
print(date_df)
This gives me the error: KeyError: "None of ['expiration_date'] are in the columns"
I'm able to get close if I use: date_df.index = pd.to_datetime(date_df.index)
however, I get a strange format for my key, it turns to '1970-01-01 00:00:00.000000000 2022-09-21'
I've tried adding , format="%Y-%m-%d", but it doesn't change the format.
If I use date_df.index = pd.to_datetime(date_df.index).strftime("%Y-%m-%d") it does fix the date format, but I'm still left with 1970-01-01 and my index_name is still none.
Using date_df.index.names = ['expiration_date'] will let me change the index name to expiration-date, but my index is still 0 and it adds a column for the date 1970, which I dont' want.
0
expiration_date
1970-01-01 2022-09-21
Now if I try to set the index I'm still greeted with none of expiration_date are in the columns.
As you can see I'm all over the place, what is the correct way to assign an index for dataframe on a date field?
The commented code is where I'm stuck:
date_df = pd.DataFrame(data["expiration_date"].unique())
date_df.index.names = ['expiration_date']
date_df.index = pd.to_datetime(date_df.index).strftime("%Y-%m-%d")
# date_df.set_index('expiration_date', inplace=True)
print(date_df.index.name)
print(date_df)
If you want to create a DataFrame, which is a copy of your first "data" DataFrame, with unique values of the 'expiration_date' column, and set its index as this column you can use this code:
# copy data DataFrame and set its index as expiration_date
date_df = data.set_index("expiration_date")
# drop duplicated index
date_df=date_df[~date_df.index.duplicated(keep='first')]
Issue with your existing code is related to this line date_df = pd.DataFrame(data["expiration_date"].unique()). This line creates DataFrame indexed from 0 to length, and its first column called "0" that gets your unique values. If this is what you want you can change this line like:
date_df = pd.DataFrame(data["expiration_date"].unique(),columns=["expiration_date"])
date_df.set_index('expiration_date', inplace=True)

python pandas: how to modify column header name and modify the date formate

Using python pandas how can we change the data frame
First, how to copy the column name down to other cell(blue)
Second, delete the row and index column(orange)
Third, modify the date formate(green)
I would appreciate any feedback~~
Update
df.iloc[1,1] = df.columns[0]
df = df.iloc[1:].reset_index(drop=True)
df.columns = df.iloc[0]
df = df.drop(df.index[0])
df = df.set_index('Date')
print(df.columns)
Question 1 - How to copy column name to a column (Edit- Rename column)
To rename a column pandas.DataFrame.rename
df.columns = ['Date','Asia Pacific Equity Fund']
# Here the list size should be 2 because you have 2 columns
# Rename using pandas pandas.DataFrame.rename
df.rename(columns = {'Asia Pacific Equity Fund':'Date',"Unnamed: 1":"Asia Pacific Equity Fund"}, inplace = True)
df.columns will return all the columns of dataframe where you can access each column name with index
Please refer Rename unnamed column pandas dataframe to change unnamed columns
Question 2 - Delete a row
# Get rows from first index
df = df.iloc[1:].reset_index()
# To remove desired rows
df.drop([0,1]).reset_index()
Question 3 - Modify the date format
current_format = '%Y-%m-%d %H:%M:%S'
desired_format = "%Y-%m-%d"
df['Date'] = pd.to_datetime(df['Date']).dt.strftime(desired_format)
# Input the existing format
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=current_format).dt.strftime(desired_format)
# To update date format of Index
df.index = pd.to_datetime(df.index,infer_datetime_format=current_format).strftime(desired_format)
Please refer pandas.to_datetime for more details
I'm not sure I understand your questions. I mean, do you actually want to change the dataframe or how it is printed/displayed?
Indexes can be changed by using methods .set_index() or .reset_index(), or can be dropped eventually. If you just want to remove the first digit from each index (that's what I understood from the orange column), you should then create a list with the new indexes and pass it as a column to your dataframe.
Regarding the date format, it depends on what you want the changed format to become. Take a look into python datetime.
I would strongly suggest you to take a better look into pandas features and documentations, and how to handle a dataframe with this library. There is plenty of great sources a Google-search away :)
Delete the first two rows using this.
Rename the second column using this.
Work with datetime format using the datetime package. Read about it here

How to find percent increase/decrease with datetime values in Python?

I have attached a photo of how the data is formatted when I print the df in Jupyter, please check that for reference.
Set the DATE column as the index, checked the data type of the index, and converted the index to be a datetime index.
import pandas as pd
df = pd.read_csv ('UMTMVS.csv',index_col='DATE',parse_dates=True)
df.index = pd.to_datetime(df.index)
I need to print out percent increase in value from Month/Year to Month/Year and percent decrease in value from Month/Year to Month/Year.
dataframe format picture
The first correction pertains to how to read your DataFrame.
Passing parse_dates you should define a list of columns to be parsed
as dates. So this instruction should be changed to:
df = pd.read_csv('UMTMVS.csv', index_col='DATE', parse_dates=['DATE'])
and then the second instruction in not needed.
To find the percent change in UMTMVS column, use: df.UMTMVS.pct_change().
For your data the result is:
DATE
1992-01-01 NaN
1992-02-01 0.110968
1992-03-01 0.073036
1992-04-01 -0.040080
1992-05-01 0.014875
1992-06-01 -0.330455
1992-07-01 0.368293
1992-08-01 0.078386
1992-09-01 0.082884
1992-10-01 -0.030528
1992-11-01 -0.027791
Name: UMTMVS, dtype: float64
Maybe you should multiply it by 100, to get true percents.

Difference between dates between corresponding rows in pandas dataframe

Below is the example of a sample pandas dataframe. I am trying to find the difference between the dates in the two rows (with the first row as the base):
PH_number date Type
H09879721 2018-05-01 AccountHolder
H09879731 2018-06-22 AccountHolder
If the difference between two dates is within 90 days, then those two rows should be added to a new pandas dataframe. The date column is of type object.
How can I do this?
Use .diff():
df.date.diff()<=pd.Timedelta(90,'d')
0 False
1 True
Name: date, dtype: bool
Convert date column to datetime64[ns] data type using pd.to_datetime and then subtract as given:
df['date'] = pd.to_datetime(df['date'])
#if comparing with only 1st row
mask = (df['date']-df.loc[0,'date']).dt.days<=90
# alternative mask = (df['date']-df.loc[0,'date']).dt.days.le(90)
#if comparing with immediate rows.
mask = df['date'].diff().dt.days<=90
# alternative mask = df['date'].diff().dt.days.le(90)
df1 = df.loc[mask,:] #gives you required rows with all columns

Find max time delta after sorting data and applying groupby

I have a data frame (df) that contains two rows with multiple entries per 'name':
name date
Official Press FRC 2015-02-19 20:30:00.000
Other Publications BOJ 2015-04-16 07:00:00.000
Bank of Russia 2015-06-11 09:44:37.000
I would like to find the maximum difference in 'dates' for each 'name'. My approach to this was to try and sort the dates while simultaneously grouping by name and then take the difference using .diff(). Below is the code I tried:
grouped = df.sort_values('date').groupby('name')
differences = grouped.diff()
I also tried to approach the problem by constructing a pivot table:
grouped = df.pivot(columns='name', values='date')
I think you need custom function with diff and max for max timedelta:
#if necessary convert to datetime
df['date'] = pd.to_datetime(df['date'])
df1 = df.sort_values('date')
.groupby('name')['date'].apply(lambda x: x.diff().max())
.reset_index(name='max_diff')

Categories