Choosing a date randomly in a period? - python

I want to randomly choose a date from 2021/1/1 to 2021/12/31, the process might include as follows:
generate a date list from 2021/1/1 to 2021/12/31, totally 365 elements;
randomly choose a date from the list.
Thanks!

As you tagged the question pandas, here is a pandas way:
out = (pd.date_range('2021/1/1', '2021/12/31') # random dates
.strftime('%Y/%m/%d') # format as string
.to_series().sample(n=1) # select 1 random date
.squeeze() # compress output
)
variant, setting the start date and number of days
out = (pd.date_range('2021/1/1', periods=365) # random dates
.strftime('%Y/%m/%d') # format as string
.to_series().sample(n=1) # select 1 random date
.squeeze() # compress output
)
example output: '2021/10/09'
list of random dates
You can easily adapt to generate several dates:
out = (pd
.date_range('2021/1/1', periods=365)
.strftime('%Y/%m/%d').to_series()
.sample(n=10)
.to_list()
)
example output:
['2021/04/06', '2021/09/11', '2021/08/02', '2021/09/17', '2021/12/30',
'2021/10/27', '2021/03/09', '2021/02/27', '2021/11/28', '2021/01/18']

Here's another way, using random between to epoch dates:
import pandas as pd
import numpy as np
date1 = pd.Timestamp("2021/01/01")
date2 = pd.Timestamp("2021/12/31")
print(date1.timestamp())
print(date2.timestamp())
n = 3 # How many samples to take
out = pd.to_datetime(
np.random.randint(date1.timestamp(), date2.timestamp(), n), unit="s"
).normalize()
print(out)
Output:
1609459200.0
1640908800.0
DatetimeIndex(['2021-04-13', '2021-01-17', '2021-08-24'], dtype='datetime64[ns]', freq=None)

import datetime
from datetime import date, timedelta
from random import sample
start = date(2021, 1, 1)
end = date(2021, 12, 31)
dates = []
day = start
while day <= end:
dates.append(day)
day = day + datetime.timedelta(days=1)
sample(dates, 1)

Related

How to get last three months in year-mon format in python?

I want to get last three months from todays date in year-mon format. For example if todays date is 2021-08-04 then I want list of last three months as -
["2021-05", "2021-06", "2021-07"]
I have no idea how to start with this. Any help will be appreciated.
use dateutil's relativedelta to get consistent results, as not all months have equal number of days. E.g.
from datetime import datetime
from dateutil.relativedelta import relativedelta
NOW = datetime.now() # reference date
delta = relativedelta(months=-1) # delta in time
n = 3 # how many steps
fmt = lambda dt: dt.strftime("%Y-%m") # formatter; datetime object to string
l = sorted((fmt(NOW+delta*i) for i in range(1, n+1)))
# l
# ['2021-05', '2021-06', '2021-07']

Generate a random list of n dates in the iso 8601 format within a range in Python

I want to generate a random list of dates in the iso8601 format within the range from 2019-01-01 to 2019-12-31 n times.
from datetime import date
start_date = date(2019,1,1)
end_date = date(2019,12,31)
Other threads I've looked at simply give the list of all dates within that range, but that's not what I need. I also need the dates to be in the iso8601 format. What is the best way to achieve this?
You can use random.sample to sample without replacement or random.choices to sample with replacement after generating a list of all the dates in the range.
If you don't want to store the list you could also generate N random numbers from 1 through 365, then convert those to the appropriate dates.
import random
from datetime import date, timedelta
end_date = date(2019, 12, 31)
current_date = date(2019, 1, 1)
n = 3
step = timedelta(days=1)
dates = [current_date]
while current_date != end_date:
current_date += step
dates.append(current_date)
random_dates = random.choices(dates, k=n)
print([d.isoformat() for d in random_dates])
You can do something like this
import datetime
import random
# startdate
start_date = datetime.date(2019, 1, 1)
# enddate
end_date = datetime.date(2019, 12, 31)
time_between_dates = end_date - start_date
days_between_dates = time_between_dates.days
#workload in days
random.seed(a=None)
random_number_of_days = random.randrange(days_between_dates)
random_date = start_date + datetime.timedelta(days=random_number_of_days)
print(str(random_date))
Which gave the following result when I ran it
2019-06-07
A similar question has been asked here
Python - Generate random dates to create Gantt sequenced tasks?
Most of the code is from there except the last loop
I create a dataframe with an datetimeindex with two iso8601 date values. I then resample the dataframe index to every 30Minute intervals then randomly choose 3 items from the dataframe.
df=pd.DataFrame({'timestamp':['2019-01-01T00:00:00.000Z','2019-12-31T23:59:59.300Z']})
df['timestamp']=df['timestamp'].apply(lambda timestamp: datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%S.%f%z'))
print(df['timestamp'])
df=df.set_index('timestamp')
dates = df.resample('30Min').max().dropna()
#print(dates)
random_dates = random.choices(dates.index, k=3)
print(random_dates)
output:
[Timestamp('2019-08-29 16:30:00+0000', tz='UTC', freq='30T'), Timestamp('2019-11-09 03:30:00+0000', tz='UTC', freq='30T'), Timestamp('2019-08-02 12:00:00+0000', tz='UTC', freq='30T')]

pandas - get a dataframe for every day

I have a DataFrame with dates in the index. I make a Subset of the DataFrame for every Day. Is there any way to write a function or a loop to generate these steps automatically?
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize
import datetime as dt
#Get the channel feeds from Thinkspeak
response = requests.get("https://api.thingspeak.com/channels/518038/feeds.json?api_key=XXXXXX&results=500")
#Convert Json object to Python object
response_data = response.json()
channel_head = response_data["channel"]
channel_bottom = response_data["feeds"]
#Create DataFrame with Pandas
df = pd.DataFrame(channel_bottom)
#rename Parameters
df = df.rename(columns={"field1":"PM 2.5","field2":"PM 10"})
#Drop all entrys with at least on nan
df = df.dropna(how="any")
#Convert time to datetime object
df["created_at"] = df["created_at"].apply(lambda x:dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%SZ"))
#Set dates as Index
df = df.set_index(keys="created_at")
#Make a DataFrame for every day
df_2018_12_07 = df.loc['2018-12-07']
df_2018_12_06 = df.loc['2018-12-06']
df_2018_12_05 = df.loc['2018-12-05']
df_2018_12_04 = df.loc['2018-12-04']
df_2018_12_03 = df.loc['2018-12-03']
df_2018_12_02 = df.loc['2018-12-02']
Supposing that you do that on the first day of next week (so, exporting monday to sunday next monday, you can do that as follows:
from datetime import date, timedelta
day = date.today() - timedelta(days=7) # so, if today is monday, we start monday before
df = df.loc[today]
while day < today:
df1 = df.loc[str(day)]
df1.to_csv('mypath'+str(day)+'.csv') #so that export files have different names
day = day+ timedelta(days=1)
you can use:
from datetime import date
today = str(date.today())
df = df.loc[today]
and schedule the script using any scheduler such as crontab.
You can create dictionary of DataFrames - then select by keys for DataFrame:
dfs = dict(tuple(df.groupby(df.index.strftime('%Y-%m-%d'))))
print (dfs['2018-12-07'])

Pandas date_range starting from the end date to start date

In am trying to generate a range of semi-annual dates using Python. Pandas provides a function pd.date_range to help with this however I would like my date range to start from the end date and iterate backwards.
For instance given the input:
start = datetime.datetime(2016 ,2, 8)
end = datetime.datetime(2018 , 6, 1)
pd.date_range(start, end, freq='6m')
The result is:
DatetimeIndex(['2016-02-29', '2016-08-31', '2017-02-28', '2017-08-31',
'2018-02-28'])
How can I generate the following:
DatetimeIndex(['2016-02-08', '2016-06-01', '2016-12-01', '2017-06-01',
'2017-12-01', '2018-06-01'])
With the updated output (from the edit you made) you can do something like the following:
from pandas.tseries.offsets import DateOffset
end = datetime.datetime(2018 , 6, 1)
start = datetime.datetime(2016 ,2, 8)
#Get the range of months to cover
months = (end.year - start.year)*12 + end.month - start.month
#The frequency of periods
period = 6 # in months
pd.DatetimeIndex([end - DateOffset(months=e) for e in range(0, months, period)][::-1]).insert(0, start)
This is a fairly concise solution, though I didn't compare runtimes so I'm not sure how fast it is.
Basically this is just creating the dates you need as a list, and then converting it to a datetime index.
This can be done without pandas and using datutil instead. However it is more involved than it perhaps should:
from datetime import date
import math
from dateutil.relativedelta import relativedelta
#set up key dates
start = date(2016 ,2, 8)
end = date(2018 , 6, 1)
#calculate date range and number of 6 month periods
daterange = end-start
periods = daterange.days *2//365
#calculate next date in sequence and check for year roll-over
next_date = date(start.year,math.ceil(start.month/6)*6,1)
if next_date < start: next_date = date(next_date.year+1,next_date.month,1)
#add the first two values to a list
arr = [start.isoformat(),next_date.isoformat()]
#calculate all subsequent dates using 'relativedelta'
for i in range(periods):
next_date = next_date+ relativedelta(months=+6)
arr.append(next_date.isoformat())
#display results
print(arr)

Python / Pandas / Numpy - Direct calculation of number of business days between two dates excluding holidays

Is there a better / more direct way to calculate this than the following?
# 1. Set up the start and end date for which you want to calculate the
# number of business days excluding holidays.
start_date = '01JAN1986'
end_date = '31DEC1987'
start_date = datetime.datetime.strptime(start_date, '%d%b%Y')
end_date = datetime.datetime.strptime(end_date, '%d%b%Y')
# 2. Generate a list of holidays over this period
from pandas.tseries.holiday import USFederalHolidayCalendar
calendar = USFederalHolidayCalendar()
holidays = calendar.holidays(start_date, end_date)
holidays
Which gives a pandas.tseries.index.DatetimeIndex
DatetimeIndex(['1986-01-01', '1986-01-20', '1986-02-17', '1986-05-26',
'1986-07-04', '1986-09-01', '1986-10-13', '1986-11-11',
'1986-11-27', '1986-12-25', '1987-01-01', '1987-01-19',
'1987-02-16', '1987-05-25', '1987-07-03', '1987-09-07',
'1987-10-12', '1987-11-11', '1987-11-26', '1987-12-25'],
dtype='datetime64[ns]', freq=None, tz=None)
But you need a list for numpy busday_count
holiday_date_list = holidays.date.tolist()
Then with and without the holidays you get:
np.busday_count(start_date.date(), end_date.date())
>>> 521
np.busday_count(start_date.date(), end_date.date(), holidays = holiday_date_list)
>>> 501
There are some other questions slightly similar but generally working with pandas Series or Dataframes (Get business days between start and end date using pandas, Counting the business days between two series)
If you put the index you created in a dataframe, you can use resample to fill in the gaps. The offset passed to .resample() can include things like business days and even (custom) calendars:
from pandas.tseries.holiday import USFederalHolidayCalendar
C = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
start_date = '01JAN1986'
end_date = '31DEC1987'
(
pd.DataFrame(index=pd.to_datetime([start_date, end_date]))
.resample(C, closed='right')
.asfreq()
.index
.size
) - 1
The size of the index - 1 then gives us the amount of days.

Categories