Geometric rounding with numpy/quantize? - python

I've got a pandas series of data which is a curve.
I want to round it in such a way as to make it 'stepped'. Furthermore, I want the steps to be roughly within 10% of the present value. (Another way of putting this is I want the steps to increase in increments of 10%, i.e. geometrically).
I've written something that's iterative and slow:
def chunk_trades(A):
try:
last = A[0]
except:
print(A)
raise
new = []
for x in A.iteritems():
if not last or np.abs((x[1]-last)/last) > 0.1:
new.append(x[1])
last = x[1]
else:
new.append(last)
s = pd.Series(new, index=A.index)
return s
I don't want to use this code.
I'm trying to find a faster, pythonic way of doing this. I've tried using numpy.digitize() but I don't think that's what I'm looking for. Any ideas for how best to approach this?

OK, I think the solution should be something like:
np.exp(np.around(np.log(np.abs(j)), decimals=1)) * np.sign(j)
Map to logarithmic space, do the rounding, transform back.

Related

Setting boundary limits to multiple operations with random number generators

I believe that my problem is really straightforward and there must be a really easy way to solve this issue, however as I am quite new with Python, I could not sort it out by my own.
I will post a made up example that I am using than the complex script which I am currently working on in case you want to test by yourself. Please, consider the following:
import numpy as np
nData = 100
sigma_alpha = np.array([1,1])
alpha = [-23,0]
data_alpha1 = np.random.randn(nData)*sigma_alpha[0]+alpha[0]
data_alpha2 = np.random.randn(nData)*sigma_alpha[1]+alpha[1]
My issue is that I have to limit data_alpha1 and data_alpha2 to -25 as lower limit and 25 as upper limit. That means, all the elements on both arrays have to be in between the aforementioned values. So, the solution that I am looking for has also to involve a case where all the elements of data_alpha1,2<25, as the following, where multiple values will be beyond 25:
nData = 100
sigma_alpha = np.array([1,1])
alpha = [25,0]
data_alpha1 = np.random.randn(nData)*sigma_alpha[0]+alpha[0]
data_alpha2 = np.random.randn(nData)*sigma_alpha[1]+alpha[1]
The variable alpha is in a loop, so it has a dynamic value and is constantly being update.
To sum up: what I have been trying to figure out is a way to make sure that data_alpha1 and data_alpha2 returns only values inbetween -25 and 25, and in case, any value doesn't respect the condition imposed, then it should be set to the closest boundary value which it surpasses. Like, if an element of alpha_data1 <-25, then it should be replaced by -25.
Hope that I managed to be succinct and precise. I would really appreciate your help on this one!
Like this:
data_alpha1[data_alpha1 > 25] = 25
data_alpha1[data_alpha1 < -25] = -25

How can I get my function to add together its output?

So this is my line of code so far,
def Adder (i,j,k):
if i<=j:
for x in range (i, j+1):
print(x**k)
else:
print (0)
What it's supposed to do is get inputs (i,j,k) so that each number between [i,j] is multiplied the power of k. For example, Adder(3,6,2) would be 3^2 + 4^2 + 5^2 + 6^2 and eventually output 86. I know how to get the function to output the list of numbers between i and j to the power of K but I don't know how to make it so that the function sums that output. So in the case of my given example, my output would be 9, 16, 25, 36.
Is it possible to make it so that under my if conditional I can generate an output that adds up the numbers in the range after they've been taken to the power of K?
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Question now Answered, thanks to everyone who responded so quickly!
You could use built-in function sum()
def adder(i,j,k):
if i <= j:
print(sum(x**k for x in range(i,j+1)))
else:
print(0)
The documentation is here
I'm not sure if this is what you want but
if i<=j:
sum = 0
for x in range (i, j+1):
sum = sum + x**k #sum += x**k for simplicity
this will give you the sum of the powers
Looking at a few of the answers posted, they do a good job of giving you pythonic code for your solution, I thought I could answer your specific questions:
How can I get my function to add together its output?
A perhaps reasonable way is to iteratively and incrementally perform your calculations and store your interim solutions in a variable. See if you can visualize this:
Let's say (i,j,k) = (3,7,2)
We want the output to be: 135 (i.e., the result of the calculation 3^2 + 4^2 + 5^2 + 6^2 + 7^2)
Use a variable, call it result and initialize it to be zero.
As your for loop kicks off with x = 3, perform x^2 and add it to result. So result now stores the interim result 9. Now the loop moves on to x = 4. Same as the first iteration, perform x^2 and add it to result. Now result is 25. You can now imagine that result, by the time x = 7, contains the answer to the calculation 3^2+4^2+5^2+6^2. Let the loop finish, and you will find that 7^2 is also added to result.
Once loop is finished, print result to get the summed up answer.
A thing to note:
Consider where in your code you need to set and initialize the _result_ variable.
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Perhaps a bit advanced for you, but helpful to be made aware I think:
Alright, let's get some nuance added to this discussion. Since this is your first week, I wanted to jot down some things I had to learn which have helped greatly.
Iterative and Recursive Algorithms
First off, identify that the solution is an iterative type of algorithm. Where the actual calculation is the same, but is executed over different cumulative data.
In this example, if we were to represent the calculation as an operation called ADDER(i,j,k), then:
ADDER(3,7,2) = ADDER(3,6,2)+ 7^2
ADDER(3,6,2) = ADDER(3,5,2) + 6^2
ADDER(3,5,2) = ADDER(3,4,2) + 5^2
ADDER(3,4,2) = ADDER(3,3,2) + 4^2
ADDER(3,3,2) = 0 + 3^2
Problems like these can be solved iteratively (like using a loop, be it while or for) or recursively (where a function calls itself using a subset of the data). In your example, you can envision a function calling itself and each time it is called it does the following:
calculates the square of j and
adds it to the value returned from calling itself with j decremented
by 1 until
j < i, at which point it returns 0
Once the limiting condition (Point 3) is reached, a bunch of additions that were queued up along the way are triggered.
Learn to Speak The Language before using Idioms
I may get down-voted for this, but you will encounter a lot of advice displaying pythonic idioms for standard solutions. The idiomatic solution for your example would be as follows:
def adder(i,j,k):
return sum(x**k for x in range(i,j+1)) if i<=j else 0
But for a beginner this obscures a lot of the "science". It is far more rewarding to tread the simpler path as a beginner. Once you develop your own basic understanding of devising and implementing algorithms in python, then the idioms will make sense.
Just so you can lean into the above idiom, here's an explanation of what it does:
It calls the standard library function called sum which can operate over a list as well as an iterator. We feed it as argument a generator expression which does the job of the iterator by "drip feeding" the sum function with x^k values as it iterates over the range (1, j+1). In cases when N (which is j-i) is arbitrarily large, using a standard list can result in huge memory overhead and performance disadvantages. Using a generator expression allows us to avoid these issues, as iterators (which is what generator expressions create) will overwrite the same piece of memory with the new value and only generate the next value when needed.
Of course it only does all this if i <= j else it will return 0.
Lastly, make mistakes and ask questions. The community is great and very helpful
Well, do not use print. It is easy to modify your function like this,
if i<=j:
s = 0
for x in range (i, j+1):
s += x**k
return s # print(s) if you really want to
else:
return 0
Usually functions do not print anything. Instead they return values for their caller to either print or further process. For example, someone may want to find the value of Adder(3, 6, 2)+1, but if you return nothing, they have no way to do this, since the result is not passed to the program. A side note, do not capitalize functions. Those are for classes.

Efficient way to find index of interval

I'm writing a spline class in Python. The method to calculate the the spline interpolated value requires the index of the closest x data points. Currently a simplified version looks like this:
def evaluate(x):
for ii in range(N): # N = len(x_data)
if x_data[ii] <= x <= x_data[ii+1]:
return calc(x,ii)
So it iterates through the list of x_data points until it finds the lower index ii of interval in which x lies and uses that in the function calc, which performs the spline interpolation. While functional, it seems like this would be inefficient for large x_data arrays if x is close to the end of the data set. Is there a more efficient or elegant way to perform the same functionality, which does not require every interval to be checked iteratively?
Note: x_data may be assumed to be sorted so x_data[ii] < x_data[ii+1], but is not necessarily equally spaced.
this is exactly what bisect is for https://docs.python.org/2/library/bisect.html
from bisect import bisect
index = bisect(x_data,x)
#I dont think you actually need the value of the 2 closest but if you do here it is
point_less = x_data[index-1] # note this will break if its index 0 so you probably want a special case for that
point_more = x_data[index]
closest_value = min([point_less,point_more],key=lambda y:abs(x-y))
alternatively you should use binary search(in fact im pretty sure thats what bisect uses under the hood) .... it should be worst case O(log n) (assuming your input array is already sorted)

fill missing values in python array

Using: Python 2.7.1 on Windows
Hello, I fear this question has a very simple answer, but I just can't seem to find an appropriate and efficient solution (I have limited python experience). I am writing an application that just downloads historic weather data from a third party API (wundergorund). The thing is, sometimes there's no value for a given hour (eg, we have 20 degrees at 5 AM, no value for 6 AM, and 21 degrees at 7 AM). I need to have exactly one temperature value in any given hour, so I figured I could just fit the data I do have and evaluate the points I'm missing (using SciPy's polyfit). That's all cool, however, I am having problems handling my program to detect if the list has missing hours, and if so, insert the missing hour and calculate a temperature value. I hope that makes sense..
My attempt at handling the hours and temperatures list is the following:
from scipy import polyfit
# Evaluate simple cuadratic function
def tempcal (array,x):
return array[0]*x**2 + array[1]*x + array[2]
# Sample data, note it has missing hours.
# My final hrs list should look like range(25), with matching temperatures at every point
hrs = [1,2,3,6,9,11,13,14,15,18,19,20]
temps = [14.0,14.5,14.5,15.4,17.8,21.3,23.5,24.5,25.5,23.4,21.3,19.8]
# Fit coefficients
coefs = polyfit(hrs,temps,2)
# Cycle control
i = 0
done = False
while not done:
# It has missing hour, insert it and calculate a temperature
if hrs[i] != i:
hrs.insert(i,i)
temps.insert(i,tempcal(coefs,i))
# We are done, leave now
if i == 24:
done = True
i += 1
I can see why this isn't working, the program will eventually try to access indexes out of range for the hrs list. I am also aware that modifying list's length inside a loop has to be done carefully. Surely enough I am either not being careful enough or just overlooking a simpler solution altogether.
In my googling attempts to help myself I came across pandas (the library) but I feel like I can solve this problem without it, (and I would rather do so).
Any input is greatly appreciated. Thanks a lot.
When I is equal 21. It means twenty second value in list. But there is only 21 values.
In future I recommend you to use PyCharm with breakpoints for debug. Or try-except construction.
Not sure i would recommend this way of interpolating values. I would have used the closest points surrounding the missing values instead of the whole dataset. But using numpy your proposed way is fairly straight forward.
hrs = np.array(hrs)
temps = np.array(temps)
newTemps = np.empty((25))
newTemps.fill(-300) #just fill it with some invalid data, temperatures don't go this low so it should be safe.
#fill in original values
newTemps[hrs - 1] = temps
#Get indicies of missing values
missing = np.nonzero(newTemps == -300)[0]
#Calculate and insert missing values.
newTemps[missing] = tempcal(coefs, missing + 1)

value <= maximum

I wonder, is it possible to achieve similar using bit operations:
if a > maximum: a = maximum
Where 'maximum' can be a random number?
Have many similar lines in my current code. Of course could have used:
def foo(a, max=512): return a if a<max else max
Just curious if there's a more elegant and efficient way.
There's no need to define your own function for this, min and max are already built-in:
a = min(maximum, a)
As per Raymond's answer, it is also possible to use bit operations:
a = maximum ^ ((a ^ maximum) & -(a < maximum))
But in the vast majority of cases, the performance benefit isn't really worth making the code very hard to understand. Also, this only works for integers, whereas the min function can be used for all comparable types.
Using max and min would make for clear code.
That being said, it is possible to use bit-twiddling: http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax

Categories