How to clear a Set at Midnight? - python

I have a script that receives information from an endpoint, stores this information and counts the number of occurrences in a set.
At 23:59 the size of the set is written to a file and I want the set to be cleared by 00:00(midnight) to recount the number of occurrences the next day.
def delayedCounter(delayedSet):
now = datetime.now()
date = now.strftime('%Y-%m-%d')
hour = datetime.now().strftime('%H:%M')
with open('delayedData.csv','a+') as file:
if hour == '23:59':
file.write(f'NÂș : {len(delayedSet)} Data: {date}\n')
elif hour == '00:00':
delayedSet.clear()
Along with this I am using Flask to display everything in a webapp.
I make the request every 1 minute using apscheduler.
However it writes to the delayedData.csv file but does not reset the set.
Does anyone know what it could be? I can make more code available if needed

Related

Simulate Time Series Events with Accurate Scheduler

I have an API which I will need to run some tests. We have already done the stress and load testing but the best way to test is to run some real life data. I have a fact table with all the historical data for the past years. The goal is to find a busy window of that history and "replay" it against our API.
Is there a way to "replay time series" data and simulate the API requests activity in Python.
The input data is like this with hundreds of thousands rows a day:
TimeStamp Input Data
------------------------------------------
2020-01-01 00:00:01:231 ABC
2020-01-01 00:00:01:456 ABD
2020-01-01 00:00:01:789 XYZ
...
I first thought of converting each row as a cron-entry, so when each row is activated, it will trigger a request to the API and use the data entry as the payload.
However, this approach adds so much overhead of starting Python processes and the time distribution is whacked: within a second, it might start lots of processes, load the library etc.
Is there a way I can start a long running Python process to perfectly replay based on the time series data? (ideally be as accurate within a few milliseconds)
Almost like:
while True:
currenttime = datetime.now()
# find from table rows with currentime
# make web requests with those rows
And then this become synchronous and every loop requires a database lookup..
Perhaps you'd want to write your real-time playback routine to be something like this (pseudocode):
def playbackEventsInWindow(startTime, endTime):
timeDiff = datetime.timedelta(startTime, datetime.now()).total_seconds()
prevTime = startTime
while True:
nextEvent = GetFirstEventInListAfterSpecifiedTime(prevTime)
if nextEvent:
nextTime = nextEvent.getEventTimeStamp()
if (nextTime >= endTime):
return # we've reached the end of our window
sleepTimeSeconds = datetime.timedelta(datetime.now(), nextTime).total_seconds()+timeDiff
if (sleepTimeSeconds > 0.0):
time.sleep(sleepTimeSeconds)
executeWebRequestsForEvent(nextEvent)
prevTime = nextTime
else:
return # we've reached the end of the list
Note that a naive implementation of GetFirstEventInListAfterSpecifiedTime(timeStamp) would simply start at the beginning of the events-list and then linearly scan down the list until it found an event with a timestamp greater than the specified argument, and return that event... but that implementation would quickly become very inefficient if the events-list is long. However, you could tweak it by having it store the index of the value it returned on the previous call, and start its linear-scan at that position rather than from the top of the list. That would allow it to return quickly (i.e. usually after just one step) in the common case (i.e. where the requested timestamps are steadily increasing).

Getting microseconds past the hour

I am working on a quick program to generate DIS (Distributed Interactive Simulation) packets to stress test a gateway we have. I'm all set and rearing to go, except for one small issue. I'm having trouble pulling the current microseconds past the top of the hour correctly.
Currently I'm doing it like this:
now = dt.now()
minutes = int(now.strftime("%M"))
seconds = int(now.strftime("%S")) + minutes*60
microseconds = int(now.strftime("%f"))+seconds*(10**6)
However when I run this multiple times in a row, I'll get results all over the place, with numbers that cannot physically be right. Can someone sanity check my process??
Thanks very much
You can eliminate all that formatting and just do the following:
now = dt.now()
microseconds_past_the_hour = now.microsecond + 1000000*(now.minute*60 + now.second)
Keep in mind that running this multiple times in a row will continually produce different results, as the current time keeps advancing.

Python Time 'Lag' Effect in URL

My server's timezone and the data that I have fetched via the following span two consecutive hours. Once the hour changes, the hour that python syntax is getting is not found on the server that is providing the content, since the server jumps to the next hour while the data is not processed yet. In case you are wondering the data in question is weather model data in .grib2 format.
I have the following code now:
# /usr/bin/python
import time
# Save your URL to a variable:
url = time.strftime("http://nomads.ncep.noaa.gov/pub/data/nccf/nonoperational/com/hrrr/para/hrrr.%Y%m%d/hrrr.t%Hz.wrfnatf04.grib2")
# Save that string to a file:
with open('hrrr/hrrrf4.txt', 'a') as f: f.write(url+'\n')
Is there a way to 'lag' the &H variable in the above URL one hour, or another method that will delay it to ensure a smooth data processing for all desired hours?
Thank you for taking the time to answer my question.
The code below would print out the datetime of now, and then offset it by subtracting 1 hour, you could also add an hour, or minutes, seconds, etc.... I scrape lots of forums that are in different timezones than my scraping server and that's how I adjust anyway. This also helps if the servers clock is off a little bit too, you could adjust the time back of forward however much you need.
import datetime
timenow = datetime.datetime.now()
timeonehourago = timenow - datetime.timedelta(hours=1)
url = timenow.strftime("http://nomads.ncep.noaa.gov/pub/data/nccf/nonoperational/com/hrrr/para/hrrr.%Y%m%d/hrrr.t%Hz.wrfnatf04.grib2")
offseturl = timeonehourago.strftime("http://nomads.ncep.noaa.gov/pub/data/nccf/nonoperational/com/hrrr/para/hrrr.%Y%m%d/hrrr.t%Hz.wrfnatf04.grib2")
print url
print offseturl

How can I keep a list of recently seen users without running out of RAM/crashing my DB?

Here's some code that should demonstrate what I'm trying to do:
current_time = datetime.datetime.now()
recently_seen = []
user_id = 10
while True:
if user_id not in recently_seen:
recently_seen[user_id] = current_time
print 'seen {0}'.format(user_id)
else:
if current_time - recently_seen[user_id] > '5 seconds':
recently_seen[user_id] = current_time
print 'seen {0}'.format(user_id)
time.sleep(0.1)
Basically, my program is listening on a socket for users. This is wrapped in a loop that spits out user_ids as it sees them. This means, I'm seeing user_ids every few milliseconds.
What I'm trying to do is log the users it sees and at what times. Saying it saw a user at 0.1 seconds and then again at 0.7 seconds is silly. So I want to implement a 5 second buffer.
It should find a user and, if the user hasn't been seen within 5 seconds, log them to a database.
The two solutions I've come up with is:
1) Keep the user_id in a dictionary (similar to the sample code above) and check against this. The problem is, if it's running for a few days and continues finding new users, this will eventually use up my RAM
2) Log them to a database and check against that. The problem with this is, it finds users every few milliseconds. I don't want to read the database every few milliseconds...
I need some way of creating a list of limited size. That limit would be 5 seconds. Any ideas on how to implement this?
How about removing the user from your dictionary once you log them to the database?
Why aren't you using a DBM?
It will work like a dictionary but will be stored on the disk.

Help with Python loop weirdness?

I'm learning Python as my second programming language (my first real one if you don't count HTML/CSS/Javascript). I'm trying to build something useful as my first real application - an IRC bot that alerts people via SMS when certain things happen in the channel. Per a request by someone, I'm (trying) to build in scheduling preferences where people can choose not to get alerts from between hours X and Y of the day.
Anyways, here's the code I'm having trouble with:
db = open("db.csv")
for line in db:
row = line.split(",") # storing stuff in a CSV, reading out of it
recipient = row[0] # who the SMS is going to
s = row[1] # gets the first hour of the "no alert" time range
f = row[2] # gets last hour of above
nrt = [] # empty array that will store hours
curtime = time.strftime("%H") # current hour
if s == "no":
print "They always want alerts, sending email" # start time will = "no" if they always want alerts
# send mail code goes here
else:
for hour in range(int(s), int(f)): #takes start, end hours, loops through to get hours in between, stores them in the above list
nrt.append(hour)
if curtime in nrt: # best way I could find of doing this, probably a better way, like I said I'm new
print "They don't want an alert during the current hour, not sending" # <== what it says
else:
# they do want an alert during the current hour, send an email
# send mail code here
The only problem I'm having is somehow the script only ends up looping through one of the lines (or something like that) because I only get one result every time, even if I have more than one entry in the CSV file.
If this is a regular CSV file you should not try to parse it yourself. Use the standard library csv module.
Here is a short example from the docs:
import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
print row
There are at least two bugs in your program:
curtime = time.strftime("%H")
...
for hour in range(int(s), int(f)):
nrt.append(hour)
# this is an inefficient synonym for
# nrt = range(int(s), int(f))
if curtime in nrt:
...
First, curtime is a string, whereas nrt is a list of integers. Python is strongly typed, so the two are not interchangeable, and won't compare equal:
'4' == 4 # False
'4' in [3, 4, 5] # False
This revised code addresses that issue, and is also more efficient than generating a list and searching for the current hour in it:
cur_hour = time.localtime().tm_hour
if int(s) <= cur_hour < int(f):
# You can "chain" comparison operators in Python
# so that a op1 b op2 c is equivalent to a op1 b and b op2c
...
A second issue that the above does not address is that your program will not behave properly if the hours wrap around midnight (e.g. s = 22 and f = 8).
Neither of these problems are necessarily related to "the script only ends up looping through one of the lines", but you haven't given us enough information to figure out why that might be. A more useful way to ask questions is to post a brief but complete code snippet that shows the behavior you are observing, along with sample input and the resulting error messages, if any (along with traceback).
Have you tried something more simple? Just to see how your file is actually read by Python:
db = open("db.csv")
for line in db:
print line
There can be problem with format of your csv-file. That happens, for instance, when you open Unix file in Windows environment. In that case the whole file looks like single string as Windows and Unix have different line separators. So, I don't know certain cause of your problem, but offer to think in that direction.
Update:
Your have multiple ways through the body of your loop:
when s is "no": "They always want alerts, sending email" will be printed.
when s is not "no" and curtime in nrt: "They don't want an alert during the current hour, not sending" will be printed.
when s is not "no" and curtime in nrt is false (the last else): nothing will be printed and no other action undertaken.
Shouldn't you place some print statement in the last else branch?
Also, what is exact output of your snippet? Is it "They always want alerts, sending email"?
I would check the logic in your conditionals. You looping construct should work.
You could go thro an existing well written IRC bot in Python Download
Be explicit with what's in a row. Using 0, 1, 2...n is actually your bug, and it makes code very hard to read in the future for yourself or others. So let's use the handy tuple to show what we're expecting from a row. This sort of works like code as documentation
db = open("db.csv")
for line in db.readlines():
recipient, start_hour, end_hour = line.split(",")
nrt = []
etc...
This shows the reader of your code what you're expecting a line to contain, and it would have shown your bug to you the first time you ran it :)

Categories