How can I save out hourly dates to excel? - python

a=pd.date_range(start='1/1/2021', end='01/01/2022', freq='h')
I have these hourly dates for the year but when I try to save it out to excel its just one ridiculously big number and what I need is 4 columns (year;month;day;hour)
Thanks in advance!

One solution could be:
Use pd.DataFrame to construct a df with respective columns.
Use df.to_csv to write the df to a csv-file (also possible: directly to excel with df.to_excel).
import pandas as pd
a = pd.date_range(start='1/1/2021', end='01/01/2022', freq='h')
df = pd.DataFrame({'year': a.year,
'month': a.month,
'day': a.day,
'hour': a.hour})
print(df.head())
year month day hour
0 2021 1 1 0
1 2021 1 1 1
2 2021 1 1 2
3 2021 1 1 3
4 2021 1 1 4
df.to_csv('fname.csv')

Related

How to extract year and month from date in python

I tried some functions but those functions cannot produce the right answer for some dates, image of my code. Like the first date "12/11/2015", the month should be 11 instead of 12. Does anyone know how to solve this in general?
You just have to add another argument called dayfirst: bool, like so:
import pandas as pd
df = pd.DataFrame({
'date': ['12/11/2015', '19/11/2015', '7/12/2015']
})
df['month'] = pd.DatetimeIndex(df['date'], dayfirst=True).month
df['year'] = pd.DatetimeIndex(df['date'], dayfirst=True).year
print(df)
Result:
date month year
0 12/11/2015 11 2015
1 19/11/2015 11 2015
2 7/12/2015 12 2015

Python: Pandas dataframe get the year to which the week number belongs and not the year of the date

I have a csv-file: https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_per_dag.csv
I want to use it to provide insight into the corona deaths per week.
df = pd.read_csv("covid.csv", error_bad_lines=False, sep=";")
df = df.loc[df['Deceased'] > 0]
df["Date_of_publication"] = pd.to_datetime(df["Date_of_publication"])
df["Week"] = df["Date_of_publication"].dt.isocalendar().week
df["Year"] = df["Date_of_publication"].dt.year
df = df[["Week", "Year", "Municipality_name", "Deceased"]]
df = df.groupby(by=["Week", "Year", "Municipality_name"]).agg({"Deceased" : "sum"})
df = df.sort_values(by=["Year", "Week"])
print(df)
Everything seems to be working fine except for the first 3 days of 2021. The first 3 days of 2021 are part of the last week (53) of 2020: http://week-number.net/calendar-with-week-numbers-2021.html.
When I print the dataframe this is the result:
53 2021 Winterswijk 1
Woudenberg 1
Zaanstad 1
Zeist 2
Zutphen 1
So basically what I'm looking for is a way where this line returns the year of the week number and not the year of the date:
df["Year"] = df["Date_of_publication"].dt.year
You can use dt.isocalendar().year to setup df["Year"]:
df["Year"] = df["Date_of_publication"].dt.isocalendar().year
You will get year 2020 for date of 2021-01-01 but will get back to year 2021 for date of 2021-01-04 by this.
This is just similar to how you used dt.isocalendar().week for setting up df["Week"]. Since they are both basing on the same tuple (year, week, day) returned by dt.isocalendar(), they would always be in sync.
Demo
date_s = pd.Series(pd.date_range(start='2021-01-01', periods=5, freq='1D'))
date_s
0
0 2021-01-01
1 2021-01-02
2 2021-01-03
3 2021-01-04
4 2021-01-05
date_s.dt.isocalendar()
year week day
0 2020 53 5
1 2020 53 6
2 2020 53 7
3 2021 1 1
4 2021 1 2
You can simply subtract the two dates and then divide the days attribute of the timedelta object by 7.
For example, this is the current week we are on now.
time_delta = (dt.datetime.today() - dt.datetime(2021, 1, 1))
The output is a datetime timedelta object
datetime.timedelta(days=75, seconds=84904, microseconds=144959)
For your problem, you'd do something like this
time_delta = int((df["Date_of_publication"] - df["Year"].days / 7)
The output would be a number that is the current week since date_of_publication

How to convert columns in a dataframe into time series?

So I selected 3 columns from my dataframe in order to create a time series that I could then plot:
booking_date = pd.DataFrame({'day': hotel_bookings_cleaned["arrival_date_day_of_month"],
'month': hotel_bookings_cleaned["arrival_date_month"],
'year': hotel_bookings_cleaned["arrival_date_year"]})
and the output looks like:
day month year
0 1 July 2015
1 1 July 2015
2 1 July 2015
3 1 July 2015
4 1 July 2015
I tried using
dates = pd.to_datetime(booking_date)
but got the error message
ValueError: Unable to parse string "July" at position 0
I'm assuming I need to convert the Month column to a numeric value before I can convert it to a datetime, but I haven't been able to make any parsers work.
Try this
dates = pd.to_datetime(booking_date.astype(str).agg('-'.join, axis=1), format='%d-%B-%Y')
Out[13]:
0 2015-07-01
1 2015-07-01
2 2015-07-01
3 2015-07-01
4 2015-07-01
dtype: datetime64[ns]
Not sure if this is more performant than the previous answer, but you can convert your string column to integers with a dictionary mapping to fit the format that pandas expects in to_datetime()
month_map = {
'January':1,
'February':2,
'March':3,
'April':4,
'May':5,
'June':6,
'July':7,
'August':8,
'September':9,
'October':10,
'November':11,
'December':12
}
dates = pd.DataFrame({
'day':booking_date.day,
'month':booking_date.month.apply(lambda x: month_map[x]),
'year':booking_date.year
})
ts = pd.to_datetime(dates)

pd.to_datetime is getting half my dates with flipped day / months

My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30

Pandas Python- can datetime be used with vectorized inputs

My pandas dataframe has year, month and date in the first 3 columns. To convert them into a datetime type, i use a for loop that loops over each row taking the content in the first 3 columns of each row as inputs to the datetime function. Any way i can avoid the for loop here and get the dates as a datetime?
I'm not sure there's a vectorized hook, but you can use apply, anyhow:
>>> df = pd.DataFrame({"year": [1992, 2003, 2014], "month": [2,3,4], "day": [10,20,30]})
>>> df
day month year
0 10 2 1992
1 20 3 2003
2 30 4 2014
>>> df["Date"] = df.apply(lambda x: pd.datetime(x['year'], x['month'], x['day']), axis=1)
>>> df
day month year Date
0 10 2 1992 1992-02-10 00:00:00
1 20 3 2003 2003-03-20 00:00:00
2 30 4 2014 2014-04-30 00:00:00

Categories