Change only the date value in datetime column index - python

I have a dataframe with data per second the original format of that data is '%H:%M:%S'; however, when I put pd.to_datetime, automatically a date was added to that column.
I would like to change that default date to the values I am obtaining from the csv file as year, month and day. I formatted it as '%Y-%m-%d'.
I do not know how to get the right date in the datetime column I set as index. Note: the date must be the same because its a daily data
df = pd.read_csv(url,header = None, index_col = 0)
year = int(df.iloc[2][1])
month = int(df.iloc[2][2])
day = int(df.iloc[2][3])
df.index.name = None
df.drop(index= df.iloc[:7, :].index.tolist(), inplace=True)
df.drop(columns=df.columns[-1], axis = 1, inplace=True)
df.columns = ['Name Column 1','Name Column 2']
d = pd.to_datetime((datetime(year, month, day).date()), format = '%Y-%m-%d')
df.index = pd.to_datetime(df.index, format='%H:%M:%S')

Related

python error: can only concatenate str (not "datetime.timedelta") to str

i am trying to get the weeks between two dates and split into rows by week and here is the error message i got:
can only concatenate str (not "datetime.timedelta") to str
Can anyone help on this one? thanks!!!
import datetime
import pandas as pd
df=pd.read_csv(r'C:\Users\xx.csv')
print(df)
# Convert dtaframe to dates
df['Start Date'] = pd.to_datetime(df['start_date'])
df['End Date'] = pd.to_datetime(df['end_date'])
df_out = pd.DataFrame()
week = 7
# Iterate over dataframe rows
for index, row in df.iterrows():
date = row["start_date"]
date_end = row["end_date"]
dealtype = row["deal_type"]
ppg = row["PPG"]
# Get the weeks for the row
while date < date_end:
date_next = date + datetime.timedelta(week - 1)
df_out = df_out.append([[dealtype, ppg, date, date_next]])
date = date_next + datetime.timedelta(1)
# Remove extra index and assign columns as original dataframe
df_out = df_out.reset_index(drop=True)
df_out.columns = df.columns
df.to_csv(r'C:\Users\Output.csv', index=None)
date is a Timestamp object which is later converted to a datetime.timedelta object.
datetime.timedelta(week - 1) is a datetime.timedelta object.
Both of these objects can be converted to a string by using str().
If you want to concatenate the string, simply wrap it with str()
date_next = str(date) + str(datetime.timedelta(week - 1))
You converted the start_date and end_date column to datetime, but you added the converted columns as Start Date and End Date. Then, in the loop, you fetch row["start_date"], which is still a string. If you want to REPLACE the start_date column, then don't give it a new name. Spelling matters.

Pandas - Select month and year

Trying to subset a dataframe, ultimately want to export a certain month and year (Say November 2020) to a CSV. But I'm stuck at the selection part, the date column is in DD/MM/YYYY format. My attempt -
csv = r"C:\Documents\Transactions.csv"
current_month = 11
current_year = 2020
data =pd.read_csv(csv, sep=',', index_col = None)
df = data[pd.to_datetime(data['Date'],dayfirst=True).dt.month == current_month &(pd.to_datetime(data['Date']).dt.year==current_year)]
print(df)
Result is the rows with the correct year, but includes all months whereas I want it restricted the current_month variable. Any help appreciated.
Given that you have a Date column, I would suggest to first convert the column as you do it twice. You cannot apply .dt.month to the Series (whole column).
Then just apply it to the Series.
import datetime as dt
data['Date']= pd.to_datetime(data['Date'], dayfirst=True)
df = data[(data['Date'].apply(lambda x: x.month) == current_month) &
(data['Date'].apply(lambda y: y.year) == current_year)]
Convert column Date to date format first, then do the selection part as usual.
import pandas as pd
df = pd.read_csv('data-date.txt')
current_month = 11
current_year = 2020
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df[(df['Date'].dt.month == current_month) & (df['Date'].dt.year == current_year)]

Pandas mistake while sorting values

Im trying to sort my dataframe based on 'date' and 'hour' columns. Its sorting 01/11/2020 before dates like 24/10/2020.
df = pd.read_csv("some_folder")
df = df.sort_values(by = ['date','hour']).reset_index(drop=True)
In the picture you can see the sorting error.
Try to convert the column date to datetime before sorting (pd.to_datetime):
df = pd.read_csv("some_folder")
df['date'] = pd.to_datetime(df['date'], dayfirst=True) # <-- convert the column to `datetime`
df = df.sort_values(by = ['date','hour']).reset_index(drop=True)

Set new column from datetime on dataframe pandas

I am trying to set a new column(Day of year & Hour)
My date time consist of date and hour, i tried to split it up by using
data['dayofyear'] = data['Date'].dt.dayofyear
and
df['Various', 'Day'] = df.index.dayofyear
df['Various', 'Hour'] = df.index.hour
but it is always returning error, im not sure how i can split this up and get it to a new column.
I think problem is there is no DatetimeIndex, so use to_datetime first and then assign to new columns names:
df.index = pd.to_datetime(df.index)
df['Day'] = df.index.dayofyear
df['Hour'] = df.index.hour
Or use DataFrame.assign:
df.index = pd.to_datetime(df.index)
df = df.assign(Day = df.index.dayofyear, Hour = df.index.hour)

day is out of range for month error suspecting its cause of leap year

my code doesn't seem to understand there is a leap year. The code works just fine on non leap year data. The other problem I am having is that when I print out the data the year is set to 1900 vs the actual year.
def processing(chunk): enter code here
being read in (by chunksize)
chunk['Date'] = pd.to_datetime(chunk['Date'], format='%Y-%m-%d')
chunk['Year'] = chunk['Date'].dt.year.rename('Year') #creates a new
column with the year
chunk['Month'] = chunk['Date'].dt.month.rename('Month') #new column
with month
chunk['Day'] = chunk['Date'].dt.day.rename('Day') #new column with day
chunk.drop('Date', 1, inplace=True)
return;
df = pd.read_csv('NLDN_CONUS_flash_and_cloud_2012_dT4KMG.txt',
delim_whitespace=True,
names=["Date", "Time", "Latitude", "Longitude", "Current", "Multi",
"Type"], chunksize=2000000, nrows=2000000)
chunk_list = []
for chunk in df:
chunk_list.append(chunk)
df_concat = pd.concat(chunk_list)
df_concat['Date'] = pd.to_datetime(df_concat['Date'], format='%Y-%m-%d')
df_concat['month-day'] = df_concat['Date'].dt.strftime('%m-%d')
df_concat['Datetime'] = df_concat['month-day'] + ' ' + df_concat['Time']
df_concat = df_concat[['Datetime', 'Latitude', 'Longitude', 'Current',
'Multi', 'Type']]
df_concat['Datetime'] = pd.to_datetime(df_concat['Datetime'], format='%m-
%d %H:%M:%S.%f')
df_concat.set_index(df_concat['Datetime'], inplace=True)
print(df_concat)
ValueError: day is out of range for month
In
df_concat['Datetime'] = pd.to_datetime(df_concat['Datetime'], format='%m-
%d %H:%M:%S.%f')
you are converting to a datetime without any information about the year. So pandas assumes the default year, which is 1900.
I suggest to use the complete datetime as the index, then group by dayofyear or whatever is needed.

Categories