I am working on a quick program to generate DIS (Distributed Interactive Simulation) packets to stress test a gateway we have. I'm all set and rearing to go, except for one small issue. I'm having trouble pulling the current microseconds past the top of the hour correctly.
Currently I'm doing it like this:
now = dt.now()
minutes = int(now.strftime("%M"))
seconds = int(now.strftime("%S")) + minutes*60
microseconds = int(now.strftime("%f"))+seconds*(10**6)
However when I run this multiple times in a row, I'll get results all over the place, with numbers that cannot physically be right. Can someone sanity check my process??
Thanks very much
You can eliminate all that formatting and just do the following:
now = dt.now()
microseconds_past_the_hour = now.microsecond + 1000000*(now.minute*60 + now.second)
Keep in mind that running this multiple times in a row will continually produce different results, as the current time keeps advancing.
Related
I have an API which I will need to run some tests. We have already done the stress and load testing but the best way to test is to run some real life data. I have a fact table with all the historical data for the past years. The goal is to find a busy window of that history and "replay" it against our API.
Is there a way to "replay time series" data and simulate the API requests activity in Python.
The input data is like this with hundreds of thousands rows a day:
TimeStamp Input Data
------------------------------------------
2020-01-01 00:00:01:231 ABC
2020-01-01 00:00:01:456 ABD
2020-01-01 00:00:01:789 XYZ
...
I first thought of converting each row as a cron-entry, so when each row is activated, it will trigger a request to the API and use the data entry as the payload.
However, this approach adds so much overhead of starting Python processes and the time distribution is whacked: within a second, it might start lots of processes, load the library etc.
Is there a way I can start a long running Python process to perfectly replay based on the time series data? (ideally be as accurate within a few milliseconds)
Almost like:
while True:
currenttime = datetime.now()
# find from table rows with currentime
# make web requests with those rows
And then this become synchronous and every loop requires a database lookup..
Perhaps you'd want to write your real-time playback routine to be something like this (pseudocode):
def playbackEventsInWindow(startTime, endTime):
timeDiff = datetime.timedelta(startTime, datetime.now()).total_seconds()
prevTime = startTime
while True:
nextEvent = GetFirstEventInListAfterSpecifiedTime(prevTime)
if nextEvent:
nextTime = nextEvent.getEventTimeStamp()
if (nextTime >= endTime):
return # we've reached the end of our window
sleepTimeSeconds = datetime.timedelta(datetime.now(), nextTime).total_seconds()+timeDiff
if (sleepTimeSeconds > 0.0):
time.sleep(sleepTimeSeconds)
executeWebRequestsForEvent(nextEvent)
prevTime = nextTime
else:
return # we've reached the end of the list
Note that a naive implementation of GetFirstEventInListAfterSpecifiedTime(timeStamp) would simply start at the beginning of the events-list and then linearly scan down the list until it found an event with a timestamp greater than the specified argument, and return that event... but that implementation would quickly become very inefficient if the events-list is long. However, you could tweak it by having it store the index of the value it returned on the previous call, and start its linear-scan at that position rather than from the top of the list. That would allow it to return quickly (i.e. usually after just one step) in the common case (i.e. where the requested timestamps are steadily increasing).
If I don't have access to the Python time module, how would I determine the number of days that have passed since I was born? That is, within the code, how many days old I am?
The code I am trying to understand better is this:
import time
start = raw_input(“Enter the time stamp in seconds: “)
start = float(start)
end = time.time()
elapsed = end - start
st_elapsed = time.gmtime(elapsed)
print "\n"
hours_mins_secs = time.strftime("%H:%M:%S", st_elapsed)
print "Elapsed time in HH:MM:SS ->", hours_mins_secs, "\n"
Now, I looked to the site https://docs.python.org/2/library/time.html
but I didn't find the alternative related to time, without using module time.
My goal is understand better this code.
This sounds like a homework question. You should give us what you've done so far and we will help you. SO users are not your personal coders.
(No criticism intended)
My server's timezone and the data that I have fetched via the following span two consecutive hours. Once the hour changes, the hour that python syntax is getting is not found on the server that is providing the content, since the server jumps to the next hour while the data is not processed yet. In case you are wondering the data in question is weather model data in .grib2 format.
I have the following code now:
# /usr/bin/python
import time
# Save your URL to a variable:
url = time.strftime("http://nomads.ncep.noaa.gov/pub/data/nccf/nonoperational/com/hrrr/para/hrrr.%Y%m%d/hrrr.t%Hz.wrfnatf04.grib2")
# Save that string to a file:
with open('hrrr/hrrrf4.txt', 'a') as f: f.write(url+'\n')
Is there a way to 'lag' the &H variable in the above URL one hour, or another method that will delay it to ensure a smooth data processing for all desired hours?
Thank you for taking the time to answer my question.
The code below would print out the datetime of now, and then offset it by subtracting 1 hour, you could also add an hour, or minutes, seconds, etc.... I scrape lots of forums that are in different timezones than my scraping server and that's how I adjust anyway. This also helps if the servers clock is off a little bit too, you could adjust the time back of forward however much you need.
import datetime
timenow = datetime.datetime.now()
timeonehourago = timenow - datetime.timedelta(hours=1)
url = timenow.strftime("http://nomads.ncep.noaa.gov/pub/data/nccf/nonoperational/com/hrrr/para/hrrr.%Y%m%d/hrrr.t%Hz.wrfnatf04.grib2")
offseturl = timeonehourago.strftime("http://nomads.ncep.noaa.gov/pub/data/nccf/nonoperational/com/hrrr/para/hrrr.%Y%m%d/hrrr.t%Hz.wrfnatf04.grib2")
print url
print offseturl
I busted through my daily free quota on a new project this weekend. For reference, that's .05 million writes, or 50,000 if my math is right.
Below is the only code in my project that is making any Datastore write operations.
old = Streams.query().fetch(keys_only=True)
ndb.delete_multi(old)
try:
r = urlfetch.fetch(url=streams_url,
method=urlfetch.GET)
streams = json.loads(r.content)
for stream in streams['streams']:
stream = Streams(channel_id=stream['_id'],
display_name=stream['channel']['display_name'],
name=stream['channel']['name'],
game=stream['channel']['game'],
status=stream['channel']['status'],
delay_timer=stream['channel']['delay'],
channel_url=stream['channel']['url'],
viewers=stream['viewers'],
logo=stream['channel']['logo'],
background=stream['channel']['background'],
video_banner=stream['channel']['video_banner'],
preview_medium=stream['preview']['medium'],
preview_large=stream['preview']['large'],
videos_url=stream['channel']['_links']['videos'],
chat_url=stream['channel']['_links']['chat'])
stream.put()
self.response.out.write("Done")
except urlfetch.Error, e:
self.response.out.write(e)
This is what I know:
There will never be more than 25 "stream" in "streams." It's
guaranteed to call .put() exactly 25 times.
I delete everything from the table at the start of this call because everything needs to be refreshed every time it runs.
Right now, this code is on a cron running every 60 seconds. It will never run more often than once a minute.
I have verified all of this by enabling Appstats and I can see the datastore_v3.Put count go up by 25 every minute, as intended.
I have to be doing something wrong here, because 25 a minute is 1,500 writes an hour, not the ~50,000 that I'm seeing now.
Thanks
You are mixing two different things here: write API calls (what your code calls) and low-level datastore write operations. See the billing docs for relations: Pricing of Costs for Datastore Calls (second section).
This is the relevant part:
New Entity Put (per entity, regardless of entity size) = 2 writes + 2 writes per indexed property value + 1 write per composite index value
In your case Streams has 15 indexed properties resulting in: 2 + 15 * 2 = 32 write OPs per write API call.
Total per hour: 60 (requests/hour) * 25 (puts/request) * 32 (operations/put) = 48,000 datastore write operations per hour
It seems as though I've finally figured out what was going on, so I wanted to update here.
I found this older answer: https://stackoverflow.com/a/17079348/1452497.
I've missed somewhere along the line where the properties being indexed were somehow multiplying the writes by factors of at least 10, I did not expect that. I didn't need everything indexed and after turning off the index in my model, I've noticed the write ops drop DRAMATICALLY. Down to about where I expect them.
Thanks guys!
It is 1500*24=36,000 writes/day, which is very near to the daily quota.
Here's some code that should demonstrate what I'm trying to do:
current_time = datetime.datetime.now()
recently_seen = []
user_id = 10
while True:
if user_id not in recently_seen:
recently_seen[user_id] = current_time
print 'seen {0}'.format(user_id)
else:
if current_time - recently_seen[user_id] > '5 seconds':
recently_seen[user_id] = current_time
print 'seen {0}'.format(user_id)
time.sleep(0.1)
Basically, my program is listening on a socket for users. This is wrapped in a loop that spits out user_ids as it sees them. This means, I'm seeing user_ids every few milliseconds.
What I'm trying to do is log the users it sees and at what times. Saying it saw a user at 0.1 seconds and then again at 0.7 seconds is silly. So I want to implement a 5 second buffer.
It should find a user and, if the user hasn't been seen within 5 seconds, log them to a database.
The two solutions I've come up with is:
1) Keep the user_id in a dictionary (similar to the sample code above) and check against this. The problem is, if it's running for a few days and continues finding new users, this will eventually use up my RAM
2) Log them to a database and check against that. The problem with this is, it finds users every few milliseconds. I don't want to read the database every few milliseconds...
I need some way of creating a list of limited size. That limit would be 5 seconds. Any ideas on how to implement this?
How about removing the user from your dictionary once you log them to the database?
Why aren't you using a DBM?
It will work like a dictionary but will be stored on the disk.