Get Intervals Between Two Times - python

A User will specify a time interval of n secs/mins/hours and then two times (start / stop).
I need to be able to take this interval, and then step through the start and stop times, in order to get a list of these times. Then after this, I will perform a database look up via a table.objects.filter, in order to retrieve the data corresponding to each time.
I'm making some ridiculously long algorithms at the moment and I'm positive there could be an easier way to do this. That is, a more pythonic way. Thoughts?

it fits nicely as a generator, too:
def timeseq(start,stop,interval):
while start <= stop:
yield start
start += interval
used as:
for t in timeseq(start,stop,interval):
table.objects.filter(t)
or:
data = [table.objects.filter(t) for t in timeseq(start,stop,interval)]

Are you looking for something like this? (pseudo code)
t = start
while t != stop:
t += interval
table.objects.filter(t)

What about ...
result = RelevantModel.objects.filter(relavant_field__in=[
start + interval * i
for i in xrange((start - end).seconds / interval.seconds)
])
... ?
I can't imagine this is very different from what you're already doing, but perhaps it's more compact (particularly if you weren't using foo__in=[bar] or a list comprehension). Of course start and end would be datetime.datetime objects and interval would be a datetime.timedelta object.

Related

a simpler way to sum up each value

# total payments = the sum of monthly payments
# object-level method for calculation in Loan class
def totalPayments(self):
# the monthly payment might be different depending on the period
t = 0 # initialize the period
m_sum = 0 # initialize the sum
while t < self._term: # run until we reach the total term
m_sum += self.monthlyPayment(t) # sum up each monthly payment
t += 1 # go to next period
return m_sum
monthly payment might be different depending on different period, so instead of simply multiplying it by term, I chose to sum up each payment individually. Is there a easier way of doing this?
I thought to do this at first
sum(payment for payment in self.monthlyPayment(t) if term <= t)
But t is not initialized and won't be incremented to calculate each payment. So I was wondering if there is any easier approach that could possibly achieve the above functionality in a single line or so?
Your variable t increments by 1 each time, so why don't you use a range object?
for t in range(0, self._term): # You can omitt the 0
...
So, if you want to mantain your comprehension, the best way should be this:
sum(self.monthlyPayment(t) for t in range(self._term))
You're close, but you need to iterate over ts here, and range lets you bake in the end condition:
sum(self.monthlyPayment(t) for t in range(self._term))
or if you like using map (slightly less verbose since you've already got a method doing what you want, if less familiar to some, and perhaps trivially faster by avoiding bytecode execution during the loop):
sum(map(self.monthlyPayment, range(self._term)))
I think the proper statement would be
sum(self.monthlyPayment(t) for t in range(self._term))
self.monthlyPayment(t) doesn't return a sequence that you can iterate over. You need to loop over the range of arguments to this function and call it for each.
sum(self.monthyPayment(t) for t in range(self._term))
That should do it.
m_sum = sum(self.monthlyPayment(t) for t in range(self._term))

How can I get my function to add together its output?

So this is my line of code so far,
def Adder (i,j,k):
if i<=j:
for x in range (i, j+1):
print(x**k)
else:
print (0)
What it's supposed to do is get inputs (i,j,k) so that each number between [i,j] is multiplied the power of k. For example, Adder(3,6,2) would be 3^2 + 4^2 + 5^2 + 6^2 and eventually output 86. I know how to get the function to output the list of numbers between i and j to the power of K but I don't know how to make it so that the function sums that output. So in the case of my given example, my output would be 9, 16, 25, 36.
Is it possible to make it so that under my if conditional I can generate an output that adds up the numbers in the range after they've been taken to the power of K?
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Question now Answered, thanks to everyone who responded so quickly!
You could use built-in function sum()
def adder(i,j,k):
if i <= j:
print(sum(x**k for x in range(i,j+1)))
else:
print(0)
The documentation is here
I'm not sure if this is what you want but
if i<=j:
sum = 0
for x in range (i, j+1):
sum = sum + x**k #sum += x**k for simplicity
this will give you the sum of the powers
Looking at a few of the answers posted, they do a good job of giving you pythonic code for your solution, I thought I could answer your specific questions:
How can I get my function to add together its output?
A perhaps reasonable way is to iteratively and incrementally perform your calculations and store your interim solutions in a variable. See if you can visualize this:
Let's say (i,j,k) = (3,7,2)
We want the output to be: 135 (i.e., the result of the calculation 3^2 + 4^2 + 5^2 + 6^2 + 7^2)
Use a variable, call it result and initialize it to be zero.
As your for loop kicks off with x = 3, perform x^2 and add it to result. So result now stores the interim result 9. Now the loop moves on to x = 4. Same as the first iteration, perform x^2 and add it to result. Now result is 25. You can now imagine that result, by the time x = 7, contains the answer to the calculation 3^2+4^2+5^2+6^2. Let the loop finish, and you will find that 7^2 is also added to result.
Once loop is finished, print result to get the summed up answer.
A thing to note:
Consider where in your code you need to set and initialize the _result_ variable.
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Perhaps a bit advanced for you, but helpful to be made aware I think:
Alright, let's get some nuance added to this discussion. Since this is your first week, I wanted to jot down some things I had to learn which have helped greatly.
Iterative and Recursive Algorithms
First off, identify that the solution is an iterative type of algorithm. Where the actual calculation is the same, but is executed over different cumulative data.
In this example, if we were to represent the calculation as an operation called ADDER(i,j,k), then:
ADDER(3,7,2) = ADDER(3,6,2)+ 7^2
ADDER(3,6,2) = ADDER(3,5,2) + 6^2
ADDER(3,5,2) = ADDER(3,4,2) + 5^2
ADDER(3,4,2) = ADDER(3,3,2) + 4^2
ADDER(3,3,2) = 0 + 3^2
Problems like these can be solved iteratively (like using a loop, be it while or for) or recursively (where a function calls itself using a subset of the data). In your example, you can envision a function calling itself and each time it is called it does the following:
calculates the square of j and
adds it to the value returned from calling itself with j decremented
by 1 until
j < i, at which point it returns 0
Once the limiting condition (Point 3) is reached, a bunch of additions that were queued up along the way are triggered.
Learn to Speak The Language before using Idioms
I may get down-voted for this, but you will encounter a lot of advice displaying pythonic idioms for standard solutions. The idiomatic solution for your example would be as follows:
def adder(i,j,k):
return sum(x**k for x in range(i,j+1)) if i<=j else 0
But for a beginner this obscures a lot of the "science". It is far more rewarding to tread the simpler path as a beginner. Once you develop your own basic understanding of devising and implementing algorithms in python, then the idioms will make sense.
Just so you can lean into the above idiom, here's an explanation of what it does:
It calls the standard library function called sum which can operate over a list as well as an iterator. We feed it as argument a generator expression which does the job of the iterator by "drip feeding" the sum function with x^k values as it iterates over the range (1, j+1). In cases when N (which is j-i) is arbitrarily large, using a standard list can result in huge memory overhead and performance disadvantages. Using a generator expression allows us to avoid these issues, as iterators (which is what generator expressions create) will overwrite the same piece of memory with the new value and only generate the next value when needed.
Of course it only does all this if i <= j else it will return 0.
Lastly, make mistakes and ask questions. The community is great and very helpful
Well, do not use print. It is easy to modify your function like this,
if i<=j:
s = 0
for x in range (i, j+1):
s += x**k
return s # print(s) if you really want to
else:
return 0
Usually functions do not print anything. Instead they return values for their caller to either print or further process. For example, someone may want to find the value of Adder(3, 6, 2)+1, but if you return nothing, they have no way to do this, since the result is not passed to the program. A side note, do not capitalize functions. Those are for classes.

Don't check variable every iteration

I currently have a for-loop, which is going through an incredible number of iterations to check something, and when it goes to a new iteration, I need it to check whether or not a variable I have is the same size of the current iteration.
Here is an example code of what I'm doing:
import datetime
now = datetime.datetime.now()
printcounter = 0
for i in range(3,100000000000+1,2):
if (printcounter == 1000000000):
print(i,"at %d" %now.hour, "hours and %d" % now.minute, "minutes.")
printcounter = 0
else:
#Do operation
printcounter += 1
However, since it's going through possibly millions of math heavy operations before I get my answer, I noticed that by striping this code of the 'printcounter' variable and not giving me a progress report gave me a significant speedup, by whole minutes sometimes.
Is there any way of only checking whether or not the 'printercounter' variable is equal to 10000, however without making it check every single iteration?
I personally can't think of anyway without resorting to nesting for loops, which can get very dirty, and I'd rather not have.
By the way, I'm using Windows 8.1, Python 3.5.1, if that makes any difference.
Edit:
I understand that it takes a significant portion of time to print, however, if I instead print to a file; my harddisk being very fast, then I still get the same, albeit reduced, difference in time. Also, I have been wanting to get the solution to this implemented in a lot of other scripts, so even if it's not a major problem here, I'd still like to know how to do it.
Edit 2:
Perhaps it's my fault for not being clear. I was looking to see if it was possible to check a value every once in a while, not every single time. For example, I don't want my code to check if 'printcounter' is equal to 1000000000 when it's 1, that's ridiculous. I know machines operate ridiculously fast, and so it doesn't matter, but I was curious to see if it was possible to reduce the number of times it checks that way, rather than having a dumb code which allows itself to be sloppy or lazy just because it's quick enough to correct for it.
If you don't want to check the variable every iteration, make it unnecessary...
by doing something like this instead:
import datetime
iterations = 100000000000
subiterations = 10000
chunks, remaining = divmod(iterations, subiterations)
now = datetime.datetime.now()
printcounter = 0
for i in range(chunks):
for j in range(subiterations):
#Do operation
pass
printcounter += subiterations
print('{:,d} at {} hours {} minutes'.format(printcounter, now.hour, now.minute))
if remaining:
for j in range(remaining):
#Do operation
pass
printcounter += remaining
print('{:,d} at {} hours {} minutes'.format(printcounter, now.hour, now.minute))
The speedup isn't because of checking that variable. It's because of the print statement itself. So no, there's no way to speed it up further besides removing that statement.
And to answer your specific question explicitly: you could restructure your code such that it isn't necessary to make that check, for example, using nested for loops. But that will likely be slower. The time it takes to check that one boolean comparison is very small.
Since printcounter is incremented at every iteration, why not use nested for loops?
Something roughly like this:
import datetime
now = datetime.datetime.now()
for j in range(100):
print(j, "at %d" %now.hour, "hours and %d" % now.minute, "minutes.")
for i in range(1000000000):
#Do operation
It's not going to make much difference because the int math is small compared to the actual print statement, but:
import datetime
now = datetime.datetime.now()
step = 2
init = 3
for i in range(init, 100000000000+1, step):
if (i % 10000*step) == init: #since we start at 3 and step by 2
print(i,"at %d" %now.hour, "hours and %d" % now.minute, "minutes.")
# Do Stuff
This structure will eliminate a few operations, but none of them are slow operations. But in terms of code structuring this is how I'd do it.

nearest timestamp price - ready data structure in Python?

Price interpolation. Python data structure for efficient near miss searches?
I have price data
[1427837961000.0, 243.586], [1427962162000.0, 245.674], [1428072262000.0, 254.372], [1428181762000.0, 253.366], ...
with the first dimension a timestamp, and the second a price.
Now I want to know the price which is nearest to a given timestamp e.g. to 1427854534654.
What is the best Python container, data structure, or algorithm to solve this many hundred or thousand times per second? It is a standard problem, and has to be solved in many applications, so there should be a ready and optimized solution.
I have Googled, and found only bits and pieces that I could build upon - but I guess this question is so common, that the whole data structure should be ready as a module?
EDIT: Solved.
I used JuniorCompressor's solution with my bugfix for future dates.
The performance is fantastic:
3000000 calls took 12.82 seconds, so 0.00000427 per call (length of data = 1143).
Thanks a lot! StackOverFlow is great, and you helpers are the best!
It is very common for this problem to have your data sorted by the timestamp value and then binary search for every possible query. Binary search can be performed using the bisect module:
data = [
[1427837961000.0, 243.586],
[1427962162000.0, 245.674],
[1428072262000.0, 254.372],
[1428181762000.0, 253.366]
]
data.sort(key=lambda l: l[0]) # Sort by timestamp
timestamps = [l[0] for l in data] # Extract timestamps
import bisect
def find_closest(t):
idx = bisect.bisect_left(timestamps, t) # Find insertion point
# Check which timestamp with idx or idx - 1 is closer
if idx > 0 and abs(timestamps[idx] - t) > abs(timestamps[idx - 1] - t):
idx -= 1
return data[idx][1] # Return price
We can test like this:
>>> find_closest(1427854534654)
243.586
If we have n queries and m timestamp values, then each query needs O(log m) time. So the total time needed is O(n * log m).
In the above algorithm we search between two indexes. If we use only the midpoints of the timestamp intervals, we can simplify even more and create a faster search:
midpoints = [(a + b) / 2 for a, b in zip(timestamps, timestamps[1:])]
def find_closest_through_midpoints(t):
return data[bisect.bisect_left(midpoints, t)][1]
Try this to get nearest value
l = [ [1427837961000.0, 243.586], [1427962162000.0, 245.674], [1428072262000.0, 254.372], [1428181762000.0, 253.366]]
check_value = 1427854534654
>>>min(l, key=lambda x:abs(x[0]-check_value))[0]
1427837961000.0
Solved!
I used JuniorCompressor's solution with my bugfix for future dates.
The performance is fantastic:
3000000 calls took 12.82 seconds, so 0.00000427 per call (length of data = 1143).
Thanks a lot! StackOverFlow is great, and you helpers are the best!

fill missing values in python array

Using: Python 2.7.1 on Windows
Hello, I fear this question has a very simple answer, but I just can't seem to find an appropriate and efficient solution (I have limited python experience). I am writing an application that just downloads historic weather data from a third party API (wundergorund). The thing is, sometimes there's no value for a given hour (eg, we have 20 degrees at 5 AM, no value for 6 AM, and 21 degrees at 7 AM). I need to have exactly one temperature value in any given hour, so I figured I could just fit the data I do have and evaluate the points I'm missing (using SciPy's polyfit). That's all cool, however, I am having problems handling my program to detect if the list has missing hours, and if so, insert the missing hour and calculate a temperature value. I hope that makes sense..
My attempt at handling the hours and temperatures list is the following:
from scipy import polyfit
# Evaluate simple cuadratic function
def tempcal (array,x):
return array[0]*x**2 + array[1]*x + array[2]
# Sample data, note it has missing hours.
# My final hrs list should look like range(25), with matching temperatures at every point
hrs = [1,2,3,6,9,11,13,14,15,18,19,20]
temps = [14.0,14.5,14.5,15.4,17.8,21.3,23.5,24.5,25.5,23.4,21.3,19.8]
# Fit coefficients
coefs = polyfit(hrs,temps,2)
# Cycle control
i = 0
done = False
while not done:
# It has missing hour, insert it and calculate a temperature
if hrs[i] != i:
hrs.insert(i,i)
temps.insert(i,tempcal(coefs,i))
# We are done, leave now
if i == 24:
done = True
i += 1
I can see why this isn't working, the program will eventually try to access indexes out of range for the hrs list. I am also aware that modifying list's length inside a loop has to be done carefully. Surely enough I am either not being careful enough or just overlooking a simpler solution altogether.
In my googling attempts to help myself I came across pandas (the library) but I feel like I can solve this problem without it, (and I would rather do so).
Any input is greatly appreciated. Thanks a lot.
When I is equal 21. It means twenty second value in list. But there is only 21 values.
In future I recommend you to use PyCharm with breakpoints for debug. Or try-except construction.
Not sure i would recommend this way of interpolating values. I would have used the closest points surrounding the missing values instead of the whole dataset. But using numpy your proposed way is fairly straight forward.
hrs = np.array(hrs)
temps = np.array(temps)
newTemps = np.empty((25))
newTemps.fill(-300) #just fill it with some invalid data, temperatures don't go this low so it should be safe.
#fill in original values
newTemps[hrs - 1] = temps
#Get indicies of missing values
missing = np.nonzero(newTemps == -300)[0]
#Calculate and insert missing values.
newTemps[missing] = tempcal(coefs, missing + 1)

Categories