python - rescale a value between two numerical ranges

python - rescale a value between two numerical ranges - python

I am trying to convert a value, comprised in the range of two integers, to another integer comprised in an other given range. I want the output value to be the corresponding value of my input integer (respecting the proportions). To be more clear, here is an example of the problem I'd like to solve:
Suppose I have three integers, let's say 10, 28 and 13, that are comprised in the range (10, 28). 10 is the minimum possible value, 28 the maximum possible value.
I want, as output, integers converted to the 'corresponding' number in the range (5, 20). In this precise example, 10 would be converted to 5, 28 to 20, and 13 to a value between 5 and 20, rescaled to keep the proportions intacts. Afterwards, this value is converted to integer.
How is it possible to achieve such 'rescaling' in python ? I tried the usual calculation (value/max of first range)*max of second range but it gives wrong values except in rare cases.
At first I had problems with the division with intin python, but after correcting this, I still have wrong values. For instance, using the values given in the example, int((float(10)/28)*20) will give me 7 as result, where it should return 5 because it's the minimum possible value in the first range.
I feel like it is a bit obvious (in terms of logic and math) and I am missing something.

If you are getting wrong results, you are likely using Python2 - where a division always yield an integer - (and therefore you will get lots of rounding errors and "zeros" when it comes to scaling factors.
Python3 corrected this so that divisions return floats - in Python2 the workaround is either to put a from __future__ import division on the first line of your code (preferred), or to explicitly convert at least one of the division operands to float - on every division.
from __future__ import division
def renormalize(n, range1, range2):
delta1 = range1[1] - range1[0]
delta2 = range2[1] - range2[0]
return (delta2 * (n - range1[0]) / delta1) + range2[0]

Related

How to fix negative values in log?

So, I am getting the data from a txt file and I want to get specific data within the whole set. In the code, I am trying to grab it by specifying which indexes and which frequencies are being used for those indexes. But my log is showing a negative value and I don't how to fix that. Code is below, thanks!
indexes = [9,10,11,12,13]
frequenciesmh = [151,610,1400,4860,18000]
frequenciesgh = [i*10**-3 for i in frequenciesmh]
bigclusterallfluxes = bigcluster[indexes]
bigclusterlogflux151mhandredshift = [i[indexes] for i in bigcluster]
shiftedlogflux151mh =
[np.interp(np.log10((151*10**-3)*i[0]),np.log10(frequenciesgh),i[1:])
for i in bigclusterlogflux151mhandredshift]
shiftflux151mh = [10**i for i in shiftedlogflux151mh]
bigclusterflux151mhandredshift =
np.array(list(zip(shiftflux151mh,np.transpose(bigcluster)[9])))

I don't know what you are trying to fix exactly, but I would definitely NOT change the negative values as they would change the power to being positive always (if you know some maths you will understand that that means 1/16 ==> 16 and also 16 ==> 16).
What you probably want, as you are working with frequencies (Which are always between 0 and 1, if you normalize them, to do this divide each of them by the sum of all of them, hence your logarithm will always be smaller or equal to 0) is to make them all positive and have the - log 10 of your probability, which is quite a common value to have, then 1 == 1/10, 2 == 1/100, etc (which in genetics at least are called phred values I believe).
Summarizing always call the minus log, not the log
-math.log(0.0001)

The abs() function is what you are looking for.

How to check if function's input is within the data type limit?

I have a function which takes an array-like argument an a value argument as inputs. During the unit tests of this function (I use hypothesis), if a very large value is thrown (one that cannot be handled by np.float128), the function fails.
What is a good way to detect such values and handle them properly?
Below is the code for my function:
def find_nearest(my_array, value):
""" Find the nearest value in an unsorted array.
"""
# Convert to numpy array and drop NaN values.
my_array = np.array(my_array, copy=False, dtype=np.float128)
my_array = my_array[~np.isnan(my_array)]
return my_array[(np.abs(my_array - value)).argmin()]
Example which throws an error:
find_nearest([0.0, 1.0], 1.8446744073709556e+19)
Throws: 0.0, but the correct answer is 1.0.
If I cannot throw the correct answer, at least I would like to be able to throw an exception. The problem is that now I do not know how to identify bad inputs. A more general answer that would fit other cases is preferable, as I see this as a recurring issue.

Beware, float128 isn't actually 128 bit precision! It's in fact a longdouble implementation: https://en.wikipedia.org/wiki/Extended_precision. The precision of this type of storage is 63 bits - this is why it fails around 1e+19, because that's 63 binary bits for you. Of course, if the differences in your array is more than 1, it will be able to distinguish that on that number, it simply means that whatever difference you're trying to make it distinguish must be larger than 1/2**63 of your input value.
What is the internal precision of numpy.float128? Here's an old answer that elaborate the same thing. I've done my test and have confirmed that np.float128 is exactly a longdouble with 63 bits of precision.
I suggest you set a maximum for value, and if your value is larger than that, either:
reduce the value to that number, on the premise that everything in your array is going to be smaller than that number.
Throw an error.
like this:
VALUE_MAX = 1e18
def find_nearest(my_array, value):
if value > VALUE_MAX:
value = VALUE_MAX
...
Alternatively, you can choose more scientific approach such as actually comparing your value to the maximum of the array:
def find_nearest(my_array, value):
my_array = np.array(my_array, dtype=np.float128)
if value > np.amax(my_array):
value = np.amax(my_array)
elif value < np.amin(my_array):
value = np.amin(my_array)
...
This way you'll be sure that you never run into this problem - since your value will always be at most as large as the maximum of your array or at minimum as minimum of your array.

The problem here doesn't seem to be that a float128 can't handle 1.844...e+19, but rather that you probably can't add two floating point numbers with such radically different scales and expect to get accurate results:
In [1]: 1.8446744073709556e+19 - 1.0 == 1.8446744073709556e+19
Out[1]: True
Your best bet, if you really need this amount of accuracy, would be to use Decimal objects and put them into a numpy array as dtype 'object':
In [1]: from decimal import Decimal
In [2]: big_num = Decimal(1.8446744073709556e+19)
In [3]: big_num # Note the slight innaccuracies due to floating point conversion
Out[3]: Decimal('18446744073709555712')
In [4]: a = np.array([Decimal(0.0), Decimal(1.0)], dtype='object')
In [5]: a[np.abs(a - big_num).argmin()]
Out[5]: Decimal('1')
Note that this will be MUCH slower than typical Numpy operations, because it has to revert to Python for each computation rather than being able to leverage its own optimized libraries (since numpy doesn't have a Decimal type).
EDIT:
If you don't need this solution and just want to know if your current code will fail, I suggest the very scientific approach of "just try":
fails = len(set(my_array)) == len(set(my_array - value))
This makes sure that, when you subtract value and a unique number X in my_array, you get a unique result. This is a generally true fact about subtraction, and if it fails then it's because the floating point arithmetic isn't precise enough to handle value - X as a number distinct from value or X.

Numpy matrix exponentiation gives negative value

I wanted to use NumPy in a Fibonacci question because of its efficiency in matrix multiplication. You know that there is a method for finding Fibonacci numbers with the matrix [[1, 1], [1, 0]].
I wrote some very simple code but after increasing n, the matrix is starting to give negative numbers.
import numpy
def fib(n):
return (numpy.matrix("1 1; 1 0")**n).item(1)
print fib(90)
# Gives -1581614984
What could be the reason for this?
Note: linalg.matrix_power also gives negative values.
Note2: I tried numbers from 0 to 100. It starts to give negative values after 47. Is it a large integer issue because NumPy is coded in C ? If so, how could I solve this ?
Edit: Using regular python list matrix with linalg.matrix_power also gave negative results. Also let me add that not all results are negative after 47, it occurs randomly.
Edit2: I tried using the method #AlbertoGarcia-Raboso suggested. It resolved the negative number problem, however another issues occured. It gives the answer as -5.168070885485832e+19 where I need -51680708854858323072L. So I tried using int(), it converted it to L, but now it seems the answer is incorrect because of a loss in precision.

The reason you see negative values appearing is because NumPy has defaulted to using the np.int32 dtype for your matrix.
The maximum positive integer this dtype can represent is 231-1 which is 2147483647. Unfortunately, this is less the 47th Fibonacci number, 2971215073. The resulting overflow is causing the negative number to appear:
>>> np.int32(2971215073)
-1323752223
Using a bigger integer type (like np.int64) would fix this, but only temporarily: you'd still run into problems if you kept on asking for larger and larger Fibonacci numbers.
The only sure fix is to use an unlimited-size integer type, such as Python's int type. To do this, modify your matrix to be of np.object type:
def fib_2(n):
return (np.matrix("1 1; 1 0", dtype=np.object)**n).item(1)
The np.object type allows a matrix or array to hold any mix of native Python types. Essentially, instead of holding machine types, the matrix is now behaving like a Python list and simply consists of pointers to integer objects in memory. Python integers will be used in the calculation of the Fibonacci numbers now and overflow is not an issue.
>>> fib_2(300)
222232244629420445529739893461909967206666939096499764990979600
This flexibility comes at the cost of decreased performance: NumPy's speed originates from direct storage of integer/float types which can be manipulated by your hardware.

How to convert a generic float value into a corresponding integer?

I need to use a module that does some math on integers, however my input is in floats.
What I want to achieve is to convert a generic float value into a corresponding integer value and loose as little data as possible.
For example:
val : 1.28827339907e-08
result : 128827339906934
Which is achieved after multiplying by 1e22.
Unfortunately the range of values can change, so I cannot always multiply them by the same constant. Any ideas?
ADDED
To put it in other words, I have a matrix of values < 1, let's say from 1.323224e-8 to 3.457782e-6.
I want to convert them all into integers and loose as little data as possible.

The answers that suggest multiplying by a power of ten cause unnecessary rounding.
Multiplication by a power of the base used in the floating-point representation has no error in IEEE 754 arithmetic (the most common floating-point implementation) as long as there is no overflow or underflow.
Thus, for binary floating-point, you may be able to achieve your goal by multiplying the floating-point number by a power of two and rounding the result to the nearest integer. The multiplication will have no error. The rounding to integer may have an error up to .5, obviously.
You might select a power of two that is as large as possible without causing any of your numbers to exceed the bounds of the integer type you are using.
The most common conversion of floating-point to integer truncates, so that 3.75 becomes 3. I am not sure about Python semantics. To round instead of truncating, you might use a function such as round before converting to integer.

If you want to preserve the values for operations on matrices I would choose some value to multiply them all by.
For Example:
1.23423
2.32423
4.2324534
Multiply them all by 10000000 and you get
12342300
23242300
42324534
You can perform you multiplications, additions etc with your matrices. Once you have performed all your calculations you can convert them back to floats by dividing them all by the appropriate value depending on the operation you performed.
Mathematically it makes sense because
(Scalar multiplication)
M1` = M1 * 10000000
M2` = M2 * 10000000
Result = M1`.M2`
Result = (M1 x 10000000).(M2 x 10000000)
Result = (10000000 x 10000000) x (M1.M2)
So in the case of multiplication you would divide your result by 10000000 x 10000000.
If its addition / subtraction then you simply divide by 10000000.
You can either choose the value to multiply by through your knowledge of what decimals you expect to find or by scanning the floats and generating the value yourself at runtime.
Hope that helps.
EDIT: If you are worried about going over the maximum capacity of integers - then you would be happy to know that python automatically (and silently) converts integers to longs when it notices overflow is going to occur. You can see for yourself in a python console:
>>> i = 3423
>>> type(i)
<type 'int'>
>>> i *= 100000
>>> type(i)
<type 'int'>
>>> i *= 100000
>>> type(i)
<type 'long'>
If you are still worried about overflow, you can always choose a lower constant with a compromise for slightly less accuracy (since you will be losing some digits towards then end of the decimal point).
Also, the method proposed by Eric Postpischil seems to make sense - but I have not tried it out myself. I gave you a solution from a more mathematical perspective which also seems to be more "pythonic"

Perhaps consider counting the number of places after the decimal for each value to determine the value (x) of your exponent (1ex). Roughly something like what's addressed here. Cheers!

Here's one solution:
def to_int(val):
return int(repr(val).replace('.', '').split('e')[0])
Usage:
>>> to_int(1.28827339907e-08)
128827339907

Overflow error Python for Modular Cubes

Attempting to solve this problem:
For a positive number n, define S(n) as the sum of the integers x, for which 1 < x < n and x^3 ≡ 1 mod n.
When n=91, there are 8 possible values for x, namely : 9, 16, 22, 29, 53, 74, 79, 81.
Thus, S(91)=9+16+22+29+53+74+79+81=363.
Find S(13082761331670030).
Of course, my code works for S(91) and when attempting to find S(13082761331670030) I get two different errors.
Here is my code:
def modcube(n):
results = []
for k in range(1,n):
if k**3%n==1:
results.append(k)
return results
This produces Overflow error: range has too many items. When I try using 'xrange' instead of 'range' I get an error stating python int too large to convert to c long. I have also just tried several other things without success.
Can anyone point me in the right direction, without telling me exactly how to solve it?
No spoilers please. I've been at it for two days, my next option is to try implementing this in Java since I'm new to Python.

I think you need to understand two concepts here:
1. integer representation in C and in Python
The implementation of Python you use is called CPython, because it is written using the C language. In C, long integers (usually) are 32 bits long. It means it can work with integers between -2147483647 and 2147483648. In Python, when an integer exceeds this range, it converts them to arbitrary precision integers, where the size of the integer is limited only by the memory of your computer. However, operation on those arbitrary integers (called long integers in Python) are order of magnitude slower than operation on 32 bits integers.
2. The difference between range and xrange:
range produces a list. If you have range(10), it stores the list [0, 1, ... 9] entirely in memory. This is why storing a list of 13082761331670030 items in memory is too mush. Assuming each number is 64 bits, it would need 93 TB of RAM to store the entire list!
xrange produces an iterator. It returns each number one by one. This way, it allows to perform operations on each number of the list without needing to store the entire list in memory. But again, performing calculations on 13082761331670030 different numbers could take more time that you think... The other thing about xrange is that it doesn't work with Python long integers; it is limited (for speed reasons) to 32 bits integers. This is why your program doesn't work using xrange.
The bottom line: Project Euler problems are (more or less) classified by degree of difficulty. You should begin by lower problems first.

You wanted hints, not a solution.
Hints:
Consider that the prime factors of 13082761331670030 is equal to the following primes: 2 x 3 x 5 x 7 x 11 x 13 x 17 x 19 x 23 x 29 x 31 x 37 x 41 x 43
Chinese remainder theorem
Just because x^3 ≡ 1 mod n does not mean that there are not other values other than 3 that satisfy this condition. Specifically, prime1 ** (prime2 - 2) % prime2
My Python solution is 86 milliseconds...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.