I'd like some assistance in using a more granular time-series in my Prophet forecast plots, specifically an Hour grain on the x-axis.
My data is aggregated for each Hour of the day. In addition to the aggregated data, I create the necessary Prophet variables with:
ads_mod['y'] = ads_mod[target1]
ads_mod['ds'] = ads_mod['hour']
I then start the modeling process:
m = Prophet(interval_width=interval_width)
m.add_seasonality(name='hourly', period=1, fourier_order=30)
m.fit(ads_mod)
future = m.make_future_dataframe(periods=1,freq='H')
forecast = m.predict(future)
I plot the Forecast with:
fig = m.plot(forecast)
I have reviewed the actual code in the plot function and tried a variety of modifications to display the hour along with date(i.e., datetime value) on the x-axis, without success.
In particular, I looked at the date transform:
fcst_t = fcst['ds'].dt.to_pydatetime()
After the transform, I see my data is now in the following format, with the Hour still included.
A fragment of the plot is below and you see that on the x-axis the date(i.e., YYYY,MM,DD) is the only value displayed:
fcst_t[:10]
Out[277]:
array([datetime.datetime(2019, 12, 2, 0, 0),
datetime.datetime(2019, 12, 2, 1, 0),
datetime.datetime(2019, 12, 2, 2, 0),
datetime.datetime(2019, 12, 2, 3, 0),
datetime.datetime(2019, 12, 2, 4, 0),
datetime.datetime(2019, 12, 2, 5, 0),
datetime.datetime(2019, 12, 2, 6, 0),
datetime.datetime(2019, 12, 2, 7, 0),
datetime.datetime(2019, 12, 2, 8, 0),
datetime.datetime(2019, 12, 2, 9, 0)], dtype=object)
import matplotlib.dates as mdates
.... additional plot code here......
hours = mdates.HourLocator(interval = 1)
h_fmt = mdates.DateFormatter('%Y-%m-%d %H:%M:%S')
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(h_fmt)
Here is the link: https://urldefense.com/v3/https://stackoverflow.com/questions/48790378/how-to-get-ticks-every-hour;!!M-nmYVHPHQ!cfpmWmLR0J5OMTJIH0aiEwrHWzsnD7pHJSBdVXxRTcAMK6mQ3v8K-FudC7uC6RN78uhTDCkD$
Related
I have a situation where I have a code with which I am processing data for operated shifts.
In it, I have arrays for start and end of shifts (e.g. shift_start[0] and shift_end[0] for shift #1), and for the time between them, I need to know how many weekdays, holidays or weekend days.
The holidays I have already defined in an array of datetime entries, which should represent the holidays of a specific country (it's not the same as here and I do not seek for further more dynamic options here yet).
So basically I have it like that:
started = [datetime.datetime(2022, 2, 1, 0, 0), datetime.datetime(2022, 2, 5, 8, 0), datetime.datetime(2022, 2, 23, 11, 19, 28)]
ended = [datetime.datetime(2022, 2, 2, 16, 0), datetime.datetime(2022, 2, 5, 17, 19, 28), datetime.datetime(2022, 4, 26, 12, 30)]
holidays = [datetime.datetime(2022, 1, 3), datetime.datetime(2022, 3, 3), datetime.datetime(2022, 4, 22), datetime.datetime(2022, 4, 25)]
I'm seeking for options to go thru each of the 3 ranges and match the number of days it contains (e.g. the first range should contain 2 weekdays, the second - one weekend day)
So based on the suggestion by #gimix, I was able to develop what I needed:
for each_start, each_end in zip(started, ended): # For each period
for single_date in self.daterange(each_start, each_end): # For each day of each period
# Checking if holiday or weekend
if (single_date.replace(hour=0, minute=0, second=0) in holidays) or (single_date.weekday() > 4):
set_special_days_worked(1)
# If not holiday or weekend, then it is regular working day
else:
set_regular_days_worked(1)
Given today's date, what is the efficient way to retrieve the first and last date for previous 3 months (i.e. 3/1/2020' and '3/31/2020'; '2/1/2020' and '2/29/2020'; '1/1/2020' and '1/31/2020')?
EDIT
For previous month's first and last, the following code is working as expected. But I am not sure how to retrieve the previous 2nd and 3rd month's first and last date.
from datetime import date, timedelta
last_day_of_prev_month = date.today().replace(day=1) - timedelta(days=1)
start_day_of_prev_month = (date.today().replace(day=1)
- timedelta(days=last_day_of_prev_month.day))
# For printing results
print("First day of prev month:", start_day_of_prev_month)
print("Last day of prev month:", last_day_of_prev_month)
You may
get the 3 previous month
create the date with day 1, and last day by going to the next and remove 1 day
def before_month(month):
v = [9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
return v[month:month + 3]
dd = datetime(2020, 4, 7)
dates = [[dd.replace(month=month, day=1), dd.replace(month=month, day=monthrange(dd.year, month)[1])]
for month in before_month(dd.month)]
print(dates)
# [[datetime.datetime(2020, 1, 1, 0, 0), datetime.datetime(2020, 1, 31, 0, 0)],
# [datetime.datetime(2020, 2, 1, 0, 0), datetime.datetime(2020, 2, 29, 0, 0)],
# [datetime.datetime(2020, 3, 1, 0, 0), datetime.datetime(2020, 3, 31, 0, 0)]]
I did not found another nice way to get the 3 previous month, but sometimes the easiest way it the one to use
You can loop over the 3 previous month, just update the date to the first day of the actual month at the end of every iteration:
from datetime import date, timedelta
d = date.today()
date_array = []
date_string_array = []
for month in range(1, 4):
first_day_of_month = d.replace(day=1)
last_day_of_previous_month = first_day_of_month - timedelta(days=1)
first_day_of_previous_month = last_day_of_previous_month.replace(day=1)
date_array.append((first_day_of_previous_month, last_day_of_previous_month))
date_string_array.append((first_day_of_previos_month.strftime("%m/%d/%Y"), last_day_of_previos_month.strftime("%m/%d/%Y")))
d = first_day_of_previos_month
print(date_array)
print(date_string_array)
Results:
[(datetime.date(2020, 3, 1), datetime.date(2020, 3, 31)), (datetime.date(2020, 2, 1), datetime.date(2020, 2, 29)), (datetime.date(2020, 2, 1), datetime.date(2020, 2, 29))]
[('03/01/2020', '03/31/2020'), ('03/01/2020', '03/31/2020'), ('03/01/2020', '03/31/2020')]
I am trying to plot data from temperature sensor with time steps. I have time steps in format "hh:mm:ss" after conversion from string to datetime format. First value in the list is "21:47:22" and the last one is "06:12:22" the next day.I have been trying to plot these values with order of indexes in the list however Python automaticaly sorting it from "00:00:00" to "24:00:00" on the x axis. Here is the image.
Could you please advice how to solve this issue? Below my code:
import matplotlib.pyplot as plt
import datetime
data = []
sensor1 = []
sensor2 = []
time = []
with open("output.txt","r") as f:
data = f.readlines()
first_sensor_len = len(data[0])
for var in data:
if var[2:7] == "First" and len(var) == first_sensor_len:
sensor1.append(var[28:33])
sensor2.append(var[75:80])
time.append(datetime.datetime.strptime(var[36:44], "%H:%M:%S"))
elif var[2:8] == "Second" and len(var) == first_sensor_len:
sensor2.append(var[29:34])
sensor1.append(var[75:80])
time.append(datetime.datetime.strptime(var[83:91], "%H:%M:%S"))
plt.plot(time, sensor1)
plt.show()
Supposed time looks like
timestr = ["21:47:22", "22:12:22", "23:12:22", "00:12:22", "01:12:22", "03:12:22", "06:12:22"]
time = [datetime.datetime.strptime(ts, "%H:%M:%S") for ts in timestr]
time
[datetime.datetime(1900, 1, 1, 21, 47, 22),
datetime.datetime(1900, 1, 1, 22, 12, 22),
datetime.datetime(1900, 1, 1, 23, 12, 22),
datetime.datetime(1900, 1, 1, 0, 12, 22),
datetime.datetime(1900, 1, 1, 1, 12, 22),
datetime.datetime(1900, 1, 1, 3, 12, 22),
datetime.datetime(1900, 1, 1, 6, 12, 22)]
You can use np.diff from numpy to mark every first time of a new day. If the difference of two consecutive time values is negative, there was midnight in between.
(This boolean array is appended to one initial False, which states that the first time value has always no day offset; the result of np.diff is generally one entry shorter than its input.)
import numpy as np
newday_marker = np.append(False, np.diff(time) < datetime.timedelta(0))
newday_marker
array([False, False, False, True, False, False, False], dtype=bool)
With np.cumsum this array can be transformed into the array of dayoffsets for each time value.
day_offset = np.cumsum(newday_marker)
day_offset
array([0, 0, 0, 1, 1, 1, 1], dtype=int32)
In the end this has to be converted to timedeltas and then can be added to the original list of time values:
date_offset = [datetime.timedelta(int(dt)) for dt in day_offset]
dtime = [t + dos for t, dos in zip(time, date_offset)]
dtime
[datetime.datetime(1900, 1, 1, 21, 47, 22),
datetime.datetime(1900, 1, 1, 22, 12, 22),
datetime.datetime(1900, 1, 1, 23, 12, 22),
datetime.datetime(1900, 1, 2, 0, 12, 22),
datetime.datetime(1900, 1, 2, 1, 12, 22),
datetime.datetime(1900, 1, 2, 3, 12, 22),
datetime.datetime(1900, 1, 2, 6, 12, 22)]
update
Technically, I want to convert log data into time series frequency in spark. I've searched a lot, but didn't find a good way to deal with big data.
I know pd.dataframe can get count for some feature, but my dataset is too big to use a dataframe.
which means I need to deal with each line by MapReduce.
And what I've tried are probably stupid....
I have a RDD, whose lines are lists of tuples, which looks like:
[(datetime.datetime(2015, 9, 1, 0, 4, 12), 1),((datetime.datetime(2015, 9, 2, 0, 4, 12), 1),(datetime.datetime(2015, 4, 1, 0, 4, 12), 1),(datetime.datetime(2015, 9, 1, 0, 4, 12),1)]
[(datetime.datetime(2015, 10, 1, 0, 4, 12), 1),(datetime.datetime(2015, 7, 1, 0, 4, 12), 1)]
In each tuple, the first element is a date,
can I write a map function in spark by python to fill the count of of the tuples with the same (month, day, hour) into a 3-d array according to the date (month, day, hour) as (x,y,z) coordinates in the tuple.
here is what I've done:
def write_array(input_rdd, array):
for item in input_rdd:
requestTime = item[0]
array[requestTime.month - 1, requestTime.day -1, requestTime.hour] += 1
array_to_fill = np.zeros([12, 31, 24], dtype=np.int)
filled_array = RDD_to_fill.map(lambda s:write_array(s, array_to_fill)).collect()
with open("output.txt", 'w') as output:
json.dump(traffic, output)
And the error is:
Traceback (most recent call last):
File "traffic_count.py", line 67, in <module>
main()
File "traffic_count.py", line 58, in main
traffic = organic_userList.Map(lambda s: write_array(s, traffic_array)) \
AttributeError: 'PipelinedRDD' object has no attribute 'Map'
I thought there must be some way to save the elements in each line of RDD into a exist data structure..... Can someone help me?
Many Thanks!
If you can have the output data be a list of ((month, day, hour), count) values, the following should work:
from pyspark import SparkConf, SparkContext
import datetime
conf = SparkConf().setMaster("local[*]").setAppName("WriteDates")
sc = SparkContext(conf = conf)
RDD_to_fill = sc.parallelize([(datetime.datetime(2015, 9, 1, 0, 4, 12), 1),(datetime.datetime(2015, 9, 2, 0, 4, 12), 1),(datetime.datetime(2015, 4, 1, 0, 4, 12), 1),(datetime.datetime(2015, 9, 1, 0, 4, 12),1), (datetime.datetime(2015, 10, 1, 0, 4, 12), 1), (datetime.datetime(2015, 7, 1, 0, 4, 12), 1)])
def map_date(tup):
return ((tup[0].month, tup[0].day, tup[0].hour), tup[1])
date_rdd = RDD_to_fill.map(map_date).reduceByKey(lambda x, y: x + y)
# create a tuple for every (month, day, hour) and set the value to 0
zeros = []
for month in range(1,13):
for day in range(1,32):
for hour in range(24):
zeros.append(((month, day, hour), 0))
zeros_rdd = sc.parallelize(zeros)
# union the rdd with the date_rdd (dates with non-zero values) with the zeros_rdd (dates with all zero values)
# and then add aggregate them together (via addition) by key (i.e., date tuple)
filled_tups = date_rdd.union(zeros_rdd).reduceByKey(lambda x, y: x + y).collect()
Then, if you want to access the count for any (month, day, hour) period, you can easily do the following:
filled_dict = dict(filled_tups)
# get count for Sept 1 at 00:00
print(filled_dict[(9,1,0)]) # prints 2
Note this code doesn't properly account for non-existing days such as Feb 30, Feb 31, April 31, June 31...
I have a column formatted as such in one of my models:
TEMP_START = models.DateTimeField(null=True)
And I am attempting to do an exact lookup using queryset syntax such as
x.filter(TEMP_START=my_datetime_object) # x can be thought of as y.objects.all()
This returns no objects, when it should do more than one. However,
x.filter(TEMP_START__date=my_datetime_object.date()).filter(TEMP_START__hour=my_datetime_object.hour)
Does return the proper objects (they're hourly). Are direct datetime filters not supported, and thus keywords must be used?
====== Edit with bad results:
Searching for: {'TEMP_START': datetime.datetime(2016, 3, 31, 2, 0)}
Values in column: [{'TEMP_START': datetime.datetime(2016, 3, 29, 8, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 29, 14, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 30, 2, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 29, 20, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 30, 8, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 30, 20, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 31, 2, 0)}, {'TEMP_START': datetime.datetime(2016, 3, 30, 14, 0)}]
Values being returned: []
Code:
args_timeframe_start = {variables.temp_start: self.ranked_timeframes[rank][variables.temp_start]}
print(args_timeframe_start)
print(self.query_data.values(variables.temp_start).distinct())
query_data = self.query_data.filter(**args_timeframe_start)
print(query_data.values(variables.temp_start).distinct())
You need to find out what is my_datetime_object but most likely because DateTime fields contain python datetime.datetime objects, datetime.datetime objects is composed of year, month, date, hour, minute, second, microsecond. So if you merely compare date and hour, sure you could get results, but you couldn't guarantee that my_datetime_object matches one of the records in your database that has the same minute, second and microsecond.
Try this quickly and you could see what does datetime look like, also django doc about DateTimeField:
from datetime import datetime
date = datetime.now()
print date