how to slice dates from a dataframe using standard input function? - python

I saw the documentation in the Indexing and selecting data which involves hardcore scripting method to slice a range of data from a dataframe.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('d1.csv')
df['time']=pd.to_datetime(df['time'], unit='ns')
df = df.drop('name', 1)
df['Time'] = df['time'].dt.time
df['date'] = df['time'].dt.date
df['date'] = pd.to_datetime(df['date'])
df = df.set_index(['date'])
df= df.loc['2018-07-04':'2018-07-05']
But I need to select a range of data from standard input function, How it can be done:
Rather than using df= df.loc['2018-07-04':'2018-07-05']say in the form at the console it will be asked to Enter the start date : and Enter the stop date : and by doing so I will get the data of the selected date ranges only.
I actually tried it doing as:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('d1.csv')
df['time']=pd.to_datetime(df['time'], unit='ns')
df = df.drop('name', 1)
df['Time'] = df['time'].dt.time
df['date'] = df['time'].dt.date
df['date'] = pd.to_datetime(df['date'])
df = df.set_index(['date'])
Starting_Date = input(" Please Enter the Starting_Date : ")
Ending_Date = input(" Please Enter the Ending_Date : ")
data = df[Starting_Date:Ending_Date]
But this doesn't work...kindly have a look upon it.

Please try this. The date is in format year-month-day for example '2018-08-16'.
from datetime import datetime
a = input('Starting_Date: ')
b = input('Ending_Date :')
starting_date = datetime.strptime(a, "%Y-%m-%d").date()
ending_date = datetime.strptime(b, "%Y-%m-%d").date()
df.loc[starting_date:ending_date]
Hope that works for you :)

Related

Grouped dates are 30 days behind for Plotly Express line graphs

I have a list of daily transactions that I am trying to plot on a line graph. I decided to group by month and year and sum those groupings. The data plots on the Plotly line graph as expected except the end dates are 30 days behind. This makes it difficult if I want to add/subtract the dates to obtain a certain date range.
To get a certain date range, I am not using the grouped dates but the original dates and applying relativedelta. How can I resolve this?
import pandas as pd
from datetime import datetime, timedelta
import plotly.express as px
import sqlite3
import numpy as np
from dateutil.relativedelta import relativedelta
data = {
'Transaction_type':[ 'Debit', 'Debit', 'Credit','Debit','Debit','Debit', 'Debit', 'Credit','Debit','Debit'],
'Amount': [40,150,1000,60,80,120, 80, 1000,500,80]
}
df = pd.DataFrame(data)
df['Date'] = pd.date_range(start='6/1/2022',end='7/30/2022', periods = len(df))
df['Date'] = pd.to_datetime(df['Date'])
df['year_month'] = df['Date'].dt.strftime('%Y-%m')
#Income Expense Visual
Income_Expense = df.copy()
Income_Expense.Transaction_type.replace(['credit'], 'Income', inplace= True) #Change to Income for line legend
Income_Expense.Transaction_type.replace(['debit'], 'Expense', inplace= True) #Change to Expense for line legend
Income_Expense = pd.pivot_table(Income_Expense, values = ['Amount'], index = ['Transaction_type', 'year_month'],aggfunc=sum).reset_index()
scatter_plot = px.line(Income_Expense, x = 'year_month', y = 'Amount', color = 'Transaction_type', title = 'Income and Expense', color_discrete_sequence= ['red','green'],
category_orders= {'Cash Flow': ['Expense', 'Income']})
scatter_plot.update_layout(legend_traceorder = 'reversed')
scatter_plot.update_layout(yaxis_tickformat = ',')
scatter_plot.show()
The reason for the error is the strftime(). This will convert your date to a string. From that point onwards, plotly thinks of each date as a string. So, the names are not as you may want it. You can do a Income_Expense.info() to check
So, you need to leave the dates in the datetime format. pandas Grouper can be used to group the dates by monthly frequency. You can then plot it and specify the date format so that plotly understands that these are dates. Below is the updated code.
Note that Date needs to be in index for grouper to work. So, first I do this by the set_index(), then use the grouper with frequency as month along Transaction type, then do a sum and reset_index. This will create a dataframe that looks like the one you had, except that these are now datetime, not strings.
import pandas as pd
from datetime import datetime, timedelta
import plotly.express as px
import sqlite3
import numpy as np
from dateutil.relativedelta import relativedelta
data = {'Transaction_type':[ 'Debit', 'Debit', 'Credit','Debit','Debit','Debit', 'Debit', 'Credit','Debit','Debit'], 'Amount': [40,150,1000,60,80,120, 80, 1000,500,80]}
df = pd.DataFrame(data)
df['Date'] = pd.date_range(start='6/1/2022',end='7/30/2022', periods = len(df))
df['Date'] = pd.to_datetime(df['Date'])
df['year_month'] = df['Date'].dt.strftime('%Y-%m')
#Income Expense Visual
Income_Expense = df.copy()
Income_Expense.Transaction_type.replace(['credit'], 'Income', inplace= True) #Change to Income for line legend
Income_Expense.Transaction_type.replace(['debit'], 'Expense', inplace= True) #Change to Expense for line legend
Income_Expense = df.set_index('Date').groupby([pd.Grouper(freq="M"), 'Transaction_type']).sum().reset_index()
scatter_plot = px.line(Income_Expense, x = 'Date', y = 'Amount', color = 'Transaction_type', title = 'Income and Expense', color_discrete_sequence= ['red','green'],
category_orders= {'Cash Flow': ['Expense', 'Income']})
scatter_plot.update_layout(legend_traceorder = 'reversed')
scatter_plot.update_layout(yaxis_tickformat = ',')
scatter_plot.update_xaxes(tickformat="%d-%b-%Y")
scatter_plot.show()

Fill in Date when only knowing startdate and continous hours? Pandas

I have a dataframe which is from a license log file. The log file logs only by continueous hours. In the header of the logfile is a startdate. So everytime the hour starts with 0 a new day should begin. How can i solve this in python?
Here is a Example of which i got.
Left is current structe, right is expected output:
I immediately thought of a loop solution; there might be more pythonic ways though.
import pandas as pd
from datetime import timedelta
df=pd.read_csv('date_example.csv', parse_dates=['Date'])
for idx, row in df.iloc[1:].iterrows():
if df.loc[idx,'Hour'] == 0:
df.loc[idx,'Date']= df.loc[idx-1,'Date']+timedelta(days=1)
else:
df.loc[idx,'Date'] = df.loc[idx-1, 'Date']
you didn't add the raw data so I created a similar example
this solution assumes there are no days without data.
import pandas as pd
import datetime
import numpy as np
# example data
data = [[datetime.datetime(2021,10,28), 0,5], [np.nan, 1, 6], [np.nan, 23, 7], [np.nan, 1, 8]]
df = pd.DataFrame(data, columns = [['Date', 'Hour','License_Count']])
for i in range(1, len(df)):
if df.iat[i,1] >= df.iat[i-1,1]:
df.loc[i,'Date'] = df.iat[i-1,0]
if df.iat[i,1] <= df.iat[i-1,1]:
df.loc[i,'Date'] = df.iat[i-1,0] + datetime.timedelta(days=1)
I have done this by applying the below function.
import pandas as pd
from datetime import timedelta
df["Date"] = pd.to_datetime(df["Date"])
temp=df.copy()
def func(x):
if x['Hours'] == 0:
if x.name == 0:
temp.loc[x.name, 'Date'] = temp.loc[0, 'Date'] + timedelta(days=1)
else:
temp.loc[x.name, 'Date'] = temp.loc[x.name - 1, 'Date'] + timedelta(days=1)
else:
temp.loc[x.name, 'Date'] = temp.loc[x.name - 1, 'Date']
df.apply(func, axis = 1)
print(temp)
"temp" is your desired output.
I used an Excelsheet as input.xlsx that is similiar to your input. The date automatically starts with the hour 0, therefore I didn't use the column with the hours.
The output is then stored in the output.xlsx.
import pandas as pd
from datetime import timedelta
df = pd.read_excel("input.xlsx")
date = df['Date'][0]
for index, row in df.iterrows():
df['Date'][index] = date
date += timedelta(hours=1)
df.to_excel("output.xlsx")

manipulate Date from yfinance

When I pull stock data from yfinance, can I create other columns of data that manipulate the 'date' column? I am new to python and still learning a lot. I have created other columns using the stock price data, but I cannot figure out how to manipulate the 'date' column.
For example, 10/26/2020, I would like to create columns with the following data:
day_of_week, Monday = 1
year = 2020
month = 10
day = 26
week = 44
trade_day = 207
import pandas as pd
import numpy as np
import yfinance as yf
import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
##Get stock price data
ticker = 'NVDA'
#Data time period
now = dt.datetime.now()
startyear = 2017
startmonth=1
startday=1
start = dt.datetime(startyear, startmonth, startday)
#get data from YFinance
df = pdr.get_data_yahoo(ticker, start, now)
#create a column
df['% Change'] = (df['Adj Close'] / df['Adj Close'].shift(1))-1
df['Range'] = df['High'] - df['Low']
df
You want to use the index of your dataframe, which is of type pd.DatetimeIndex.
To split the date into new columns:
new_df = df.copy()
new_df['year'], new_df['month'], new_df['day'] = df.index.year, df.index.month, df.index.day
To carry up arithmetic operations from the first trade date:
start_date = df.index.min()
new_df['trade_day'] = df.index.day - start_date.day
new_df['trade_week'] = df.index.week - start_date.week
new_df['trade_year'] = df.index.year - start_date.year
new_df['day_of_week'] = df.index.weekday
new_df['days_in_month'] = df.index.days_in_month
new_df['day_name'] = df.index.day_name()
new_df['month_name'] = df.index.month_name()
Choose another start date
start_date = pd.to_datetime('2017-01-01')
I did figure out most of the problem. I cannot figure out how to calculate the 'trade date'.
#Convert the 'Date' Index to 'Date' Column
df.reset_index(inplace=True)
#Create columns manipulating 'Date'
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['Week of Year'] = df['Date'].dt.isocalendar().week
df['Day of Week'] = df['Date'].dt.dayofweek

Sort by date with Excel file and Pandas

I am trying to sort my Excel file by the date column. When the code runs it turns the cells from a text string to a time date and it sorts, but only within the same month. That is, when I have dates from October and September it completes by the month.
I have been all over Google and YouTube.
import pandas as pd
import datetime
from datetime import timedelta
x = datetime.datetime.now()
excel_workbook = 'data.xlsx'
sheet1 = pd.read_excel(excel_workbook, sheet_name='RAW DATA')
sheet1['Call_DateTime'] = pd.to_datetime(sheet1['Call_DateTime'])
sheet1.sort_values(sheet1['Call_DateTime'], axis=1, ascending=True, inplace=True)
sheet1['SegmentDuration'] = pd.to_timedelta(sheet1['SegmentDuration'], unit='s')
sheet1['SegmentDuration'] = timedelta(hours=0.222)
sheet1.style.apply('h:mm:ss', column=['SegmentDuration'])
sheet1.to_excel("S4x Output"+x.strftime("%m-%d")+".xlsx", index = False)
print("All Set!!")
I would like it to sort oldest to newest.
Update code and this works.
import pandas as pd
import datetime
from datetime import timedelta
x = datetime.datetime.now()
excel_workbook = 'data.xlsx'
sheet1 = pd.read_excel(excel_workbook, sheet_name='RAW DATA')
sheet1['Call_DateTime'] = pd.to_datetime(sheet1['Call_DateTime'])
sheet1.sort_values(['Call_DateTime'], axis=0, ascending=True, inplace=True)
sheet1['SegmentDuration'] = pd.to_timedelta(sheet1['SegmentDuration'], unit='s')
sheet1['SegmentDuration'] = timedelta(hours=0.222)
sheet1.style.apply('h:mm:ss', column=['SegmentDuration'])
sheet1.to_excel("S4x Output"+x.strftime("%m-%d")+".xlsx", index = False)
print("All Set!!")

KeyError: 'Date'

import pandas as pd
import numpy as np
from nsepy import get_history
import datetime as dt
start = dt.datetime(2015, 1, 1)
end = dt.datetime.today()
infy = get_history(symbol='INFY', start = start, end = end)
infy.index = pd.to_datetime(infy.index)
infy.head()
infy_volume = infy.groupby(infy['Date'].dt.year).reset_index().Volume.sum().
"Error showed as Date", but Infy_volume should be a multi-index series
with two levels of index - Year and Month
.
Here you have the date column as index so use
infy.groupby(infy.index.year).Volume.sum().reset_index()
If you want to groupby with year and month use
infy_volume = infy.groupby([infy.index.year, infy.index.month]).Volume.sum()
infy_volume.index = infy_volume.index.rename('Month', level=1)
print(infy_volume)
# infy_volume.reset_index()

Categories