Date creation using different columns in pandas - python

Having a data frame as below:
Day
Month and year
13
septiembre /98
15
August/98
24
Novem /98
Is it possible that i can merge day with month and year and create a new column.
Day
Month and year
Date
13
septiembre /98
13-09-98
15
August/98
15-08-98
24
Nov /98
24-11-98

I was able to create a panda Series that converts the data you provided from a list to a new list, and then obtains a panda Series from the new, correctly formatted list. I'm not sure if that's what you wanted, but anyway I hope this can be of some help:
import pandas as pd
day = [13, 15, 24]
monthyear = ['Semptember/98', 'August/98', 'November/98']
daymonthyear = zip(day, monthyear)
daymonthyear_new = []
for i in daymonthyear:
teste = [i[0],i[1].split("/")]
string = str(teste[0]) + "-" + str(teste[1][0]) + "-" +
str(teste[1][1])
print('string= ', string)
daymonthyear_new.append(string)
print('daymonthyear_new= ', daymonthyear_new)
import datetime
dates = pd.Series(daymonthyear_new)
dates

You could perform string slicing and concatenation provided that your dataset comes in a predictable and standard format. Cast this new string to datetime using the pd.to_datetime method.
For example, this would work for your example:
import pandas as pd
df = pd.DataFrame([[13, 'Septiembre /98'], [15, 'August/98'], [24, 'Novem /98']], columns=["Day", "Month and year"])
df['Date'] = pd.to_datetime(
df['Day'].astype('str') +
' - ' +
df['Month and year'].str.slice(0, 3) +
' - ' +
df['Month and year'].str.slice(-2)
)
print(df)
Day Month and year Date
0 13 Septiembre /98 1998-09-13
1 15 August/98 1998-08-15
2 24 Novem /98 1998-11-24

Related

How to convert date format (dd/mm/yyyy) to days in python csv

I need a function to count the total number of days in the 'days' column between a start date of 1st Jan 1995 and an end date of 31st Dec 2019 in a dataframe taking leap years into account as well.
Example: 1st Jan 1995 - Day 1, 1st Feb 1995 - Day 32 .......and so on all the way to 31st.
If you want to filter a pandas dataframe using a range of 2 date you can do this by:
start_date = '1995/01/01'
end_date = '1995/02/01'
df = df[ (df['days']>=start_date) & (df['days']<=end_date) ]
and with len(df) you will see the number of rows of the filter dataframe.
Instead, if you want to calculate a range of days between 2 different date you can do without pandas with datetime:
from datetime import datetime
start_date = '1995/01/01'
end_date = '1995/02/01'
delta = datetime.strptime(end_date, '%Y/%m/%d') - datetime.strptime(start_date, '%Y/%m/%d')
print(delta.days)
Output:
31
The only thing is that this not taking into account leap years

How to get a date from year, month, week of month and Day of week in Pandas?

I have a Pandas dataframe, which looks like below
I want to create a new column, which tells the exact date from the information from all the above columns. The code should look something like this:
df['Date'] = pd.to_datetime(df['Month']+df['WeekOfMonth']+df['DayOfWeek']+df['Year'])
I was able to find a workaround for your case. You will need to define the dictionaries for the months and the days of the week.
month = {"Jan":"01", "Feb":"02", "March":"03", "Apr": "04", "May":"05", "Jun":"06", "Jul":"07", "Aug":"08", "Sep":"09", "Oct":"10", "Nov":"11", "Dec":"12"}
week = {"Monday":1,"Tuesday":2,"Wednesday":3,"Thursday":4,"Friday":5,"Saturday":6,"Sunday":7}
With this dictionaries the transformation that I used with a custom dataframe was:
rows = [["Dec",5,"Wednesday", "1995"],
["Jan",3,"Wednesday","2013"]]
df = pd.DataFrame(rows, columns=["Month","Week","Weekday","Year"])
df['Date'] = (df["Year"] + "-" + df["Month"].map(month) + "-" + (df["Week"].apply(lambda x: (x - 1)*7) + df["Weekday"].map(week).apply(int) ).apply(str)).astype('datetime64[ns]')
However you have to be careful. With some data that you posted as example there were some dates that exceeds the date range. For example, for
row = ["Oct",5,"Friday","2018"]
The date displayed is 2018-10-33. I recommend using some logic to filter your data in order to avoid this kind of problems.
Let's approach it in 3 steps as follows:
Get the date of month start Month_Start from Year and Month
Calculate the date offsets DateOffset relative to Month_Start from WeekOfMonth and DayOfWeek
Get the actual date Date from Month_Start and DateOffset
Here's the codes:
df['Month_Start'] = pd.to_datetime(df['Year'].astype(str) + df['Month'] + '01', format="%Y%b%d")
import time
df['DateOffset'] = (df['WeekOfMonth'] - 1) * 7 + df['DayOfWeek'].map(lambda x: time.strptime(x, '%A').tm_wday) - df['Month_Start'].dt.dayofweek
df['Date'] = df['Month_Start'] + pd.to_timedelta(df['DateOffset'], unit='D')
Output:
Month WeekOfMonth DayOfWeek Year Month_Start DateOffset Date
0 Dec 5 Wednesday 1995 1995-12-01 26 1995-12-27
1 Jan 3 Wednesday 2013 2013-01-01 15 2013-01-16
2 Oct 5 Friday 2018 2018-10-01 32 2018-11-02
3 Jun 2 Saturday 1980 1980-06-01 6 1980-06-07
4 Jan 5 Monday 1976 1976-01-01 25 1976-01-26
The Date column now contains the dates derived from the information from other columns.
You can remove the working interim columns, if you like, as follows:
df = df.drop(['Month_Start', 'DateOffset'], axis=1)

Extracting date components in pandas series

I have problems with transforming a Pandas dataframe column with dates to a number.
import matplotlib.dates
import datetime
for x in arsenalchelsea['Datum']:
year = int(x[:4])
month = int(x[5:7])
day = int(x[8:10])
hour = int(x[11:13])
minute = int(x[14:16])
sec = int(x[17:19])
arsenalchelsea['floatdate']=date2num(datetime.datetime(year, month, day, hour, minute, sec))
arsenalchelsea
I want to make a new column in my dataframe with the dates in numbers, because i want to make a line graph later with the date on the x-as.
This is the format of the date:
2017-11-29 14:06:45
Does anyone have a solution for this problem?
Slicing strings to get date components is bad practice. You should convert to datetime and extract directly.
In this case, it seems you can just use pd.to_datetime, but below I also demonstrate how you can extract the various components once you have performed the conversion.
df = pd.DataFrame({'Date': ['2017-01-15 14:55:42', '2017-11-10 12:15:21', '2017-12-05 22:05:45']})
df['Date'] = pd.to_datetime(df['Date'])
df[['year', 'month', 'day', 'hour', 'minute', 'sec']] = \
df['Date'].apply(lambda x: (x.year, x.month, x.day, x.hour, x.minute, x.second)).apply(pd.Series)
Result:
Date year month day hour minute sec
0 2017-01-15 14:55:42 2017 1 15 14 55 42
1 2017-11-10 12:15:21 2017 11 10 12 15 21
2 2017-12-05 22:05:45 2017 12 5 22 5 45

Preserving a Month and Day as Date Format in Python Pandas

I'm trying to take a column in yyyy-mm-dd format and convert to it mm-dd format (or MON DD, that works too), while preserving a date or numeric format. I've tried to use pd.to_datetime, but it seems that doesn't work because it requires the year, so it ends up padding the new columns with year 1900. I'm not looking for conversion in which the new column is a object, because I need to use the column to plot later on. What's the best approach? Data frame is pretty small.
OldDate NewDate1 NewDate2 NewDate3
2017-01-02 01-02 01/02 Jan 2
2015-05-14 05-14 05/14 May 14
Let's say you have:
df = pd.DataFrame({"OldDate":["2017-01-02","2015-05-14"]})
df
OldDate
0 2017-01-02
1 2015-05-14
Then you can do:
from datetime import datetime as dt
df['OldDate'] = df.OldDate.apply(lambda s: dt.strptime(s, "%Y-%m-%d"))
df['NewDate1'] = df.OldDate.dt.strftime("%m-%d")
df['NewDate2'] = df.OldDate.dt.strftime("%m/%d")
df['NewDate3'] = df.OldDate.dt.strftime("%b %d")
df
OldDate NewDate1 NewDate2 NewDate3
0 2017-01-02 01-02 01/02 Jan 02
1 2015-05-14 05-14 05/14 May 14
You can use the substring concept on OldDate as below:
OldDate = '2017-01-02'
NewDate1=OldDate[5:]
print(NewDate1) # This will give result as : "01-02"
NewDate2 = OldDate[5:7] + "/" + OldDate[8:10]
print(NewDate2) # This will give result as "01/02"

Get the average year (mean of days over multiple years) in Pandas

I am new to Pandas timeseries and dataframes and struggle getting this simple task done.
I have a dataset "data" (1-dimensional float32-Numpy array) for each day from 1/1/2004 - 12/31/2008. The dates are stored as a list of datetime objects "dates".
Basically, I would like to calculate a complete "standard year" - the average value of each day of all years (1-365).
I started from this similar (?) question (Getting the average of a certain hour on weekdays over several years in a pandas dataframe), but could not get to the desired result - a time series of 365 "average" days, e.g. the average of all four 1st of January's, 2nd of January's ...
A small example script:
import numpy as np
import pandas as pd
import datetime
startdate = datetime.datetime(2004, 1, 1)
enddate = datetime.datetime(2008, 1, 1)
days = (enddate + datetime.timedelta(days=1) - startdate).days
data = np.random.random(days)
dates = [startdate + datetime.timedelta(days=x) for x in range(0, days)]
ts = pd.Series(data, dates)
test = ts.groupby(lambda x: (x.year, x.day)).mean()
Group by the month and day, rather than the year and day:
test = ts.groupby([ts.index.month, ts.index.day]).mean()
yields
1 1 0.499264
2 0.449357
3 0.498883
...
12 17 0.408180
18 0.317682
19 0.467238
...
29 0.413721
30 0.399180
31 0.828423
Length: 366, dtype: float64

Categories