I have two sets of data that I am reading via nested for loops in Python. I need to match lines of the two different text files using a common number (time). In the two files, time is written differently (ex. 21:53:28.339 vs. 121082008.3399). I only need the last four digits of the times to match them up, for example from 21:53:28.339, I only need '8.339'. For the most part, indexing the number as a string works just fine (ex: timeList[nid][7:]), except for situations such as the numbers listed above, where python rounds .3399 to .34.
Is there a way for me to keep the numbers in float form and to select unrounded digits from the data?
Thanks!
edit - using Decimal exclusively - with full example
import decimal
def simplify(text):
# might be a : separated value
text = text.split(':')[-1]
# turn into decimal
number = decimal.Decimal(text)
# remove everything but the ones place and forwards
number = number - (number/10).quantize(1, rounding=decimal.ROUND_FLOOR) * 10
# truncate to the thousandths
return number.quantize(decimal.Decimal('.001'), rounding=decimal.ROUND_FLOOR)
a = '121082008.3399'
b = '21:53:28.339'
assert simplify(a) == simplify(b)
print simplify(a), '=', simplify(b)
Scott if you compare the numbers using strings then you don't need any floats and there will be no 'rounding' going on.
'8.339' == '8.339'
or, if you have
a = '8.3399'
b = '8.339'
then
a[:-1] == b
however if you do decide to work with them as 'numbers', then as Ignacio pointed out, you can use decimals.
from decimal import Decimal
number_a = Decimal(a[:-1])
number_b = Decimal(b)
now
number_a == number_b
Hope that helps
It appears from your description that you want to compare using one digit before the decimal point and 3 digits after the decimal point, using truncation instead of rounding. So just do that:
>>> def extract(s):
... i = s.find('.')
... return s[i-1:i+4]
...
>>> map(extract, ['21:53:28.339', '121082008.3399'])
['8.339', '8.339']
>>>
Use decimal.Decimal instead of float.
Related
I would need to format a python Decimal object to have atleast two decimals, but no more than 5. Is there a reliable way to do this?
Examples:
1.6 --> 1.60
1.678 --> 1.678
1.98765 --> 1.98765
If there are more than two decimals, it is vital that it does not get truncated to only two decimals.
It looks to me like there are two parts to this question - one, determining the correct number of digits and two, quantizing the values to that number of digits.
To do the first, I would get the current exponent using the as_tuple() method. Unless I'm overlooking something simpler.
>>> import decimal
>>> d = decimal.Decimal("1.678")
>>> d.as_tuple().exponent
-3
>>> d2 = decimal.Decimal("1.6")
>>> d2.as_tuple().exponent
-1
So from that you can compute the desired exponent:
MAX_EXPONENT = -2
MIN_EXPONENT = -5
def desired_exponent(d):
current_exponent = d.as_tuple().exponent
return min(MAX_EXPONENT, max(MIN_EXPONENT, current_exponent))
The second is answered by the accepted answer on the marked duplicate - use the quantize() method. You'll need to construct a Decimal value with the desired exponent you can provide as the argument to quantize(). There are multiple ways to do that, but two simple ones are exponentiating decimal.Decimal("10") or using the tuple constructor for decimal.Decimal().
>>> quant_arg = decimal.Decimal("10") ** -2
>>> decimal.Decimal("1.6").quantize(quant_arg)
Decimal('1.60')
Or:
>>> quant_arg = decimal.Decimal((0, (), -2))
>>> decimal.Decimal("1.6").quantize(quant_arg)
Decimal('1.60')
I used -2 as a literal there, you'd want to use the calculated value of desired_exponent.
There are multiple ways to organize this code, I think the parts that are not obvious are a) accessing the current exponent of a decimal value and b) some of the ways of constructing an arg for quantize(). And this is all assuming you need the actual decimal objects, and aren't just outputting them - if this is a question just about output formatting re-quantizing is probably overkill.
Here is the code I use now:
def unitAmount(value):
"""Format a Decimal to match -?[0-9]{1,15}(,[0-9]{2,5})?
Minimum two decimals, max 5.
"""
decimals = value.as_tuple().exponent
if decimals == -1: # For values like 1.6 --> 1.60
value = value.quantize(Decimal('1.00'))
elif decimals < -5: # For values like 1.1234567.. --> 1.12345
value = value.quantize(Decimal('1.00000'))
return value
Basically, I have a list of float numbers with too many decimals. So when I created a second list with two decimals, Python rounded them. I used the following:
g1= ["%.2f" % i for i in g]
Where g1 is the new list with two decimals, but rounded, and g is the list with float numbers.
How can I make one without rounding them?
I'm a newbie, btw. Thanks!
So, you want to truncate the numbers at the second digit?
Beware that rounding might be the better and more accurate solution anyway.
If you want to truncate the numbers, there are a couple of ways - one of them is to multiply the number by 10 elevated to the number of desired decimal places (100 for 2 places), apply "math.floor", and divide the total back by the same number.
However, as internal floating point arithmetic is not base 10, you'd risk getting more decimal places on the division to scale down.
Another way is to create a string with 3 digits after the "." and drop the last one - that'd be rounding proof.
And again, keep in mind that this converts the numbers to strings - what should be done for presentation purposes only. Also, "%" formatting is quite an old way to format parameters in a string. In modern Python, f-strings are the preferred way:
g1 = [f"{number:.03f}"[:-1] for number in g]
Another, more correct way, is, of course, treat numbers as numbers, and not play tricks on adding or removing digits on it. As noted in the comments, the method above would work for numbers like "1.227", that would be kept as "1.22", but not for "2.99999", which would be rounded to "3.000" and then truncated to "3.00".
Python has the decimal modules, which allows for arbitrary precision of decimal numbers - which includes less precision, if needed, and control of the way Python does the rounding - including rounding towards zero, instead of the nearest number.
Just set the decimal context to the decimal.ROUND_DOWN strategy, and then convert your numbers using either the round built-in (the exact number of digits is guaranteed, unlike using round with floating point numbers), or just do the rounding as part of the string formatting anyway. You can also convert your floats do Decimals in the same step:
from decimals import Decimal as D, getcontext, ROUND_DOWN
getcontext().rounding = ROUND_DOWN
g1 = [f"{D(number):.02f}" for number in g]
Again - by doing this, you could as well keep your numbers as Decimal objects, and still be able to perform math operations on them:
g2 = [round(D(number, 2)) for number in g]
Here is my solution where we don't even need to convert the number's to string to get the desired output:
def format_till_2_decimal(num):
return int(num*100)/100.0
g = [-5.427926, -12.222018, 7.214379, -16.771845, -6.1441464, 10.1383295, 14.740516, 5.9209185, -9.740783, -10.098338]
formatted_g = [format_till_2_decimal(num) for num in g]
print(formatted_g)
Hope this solution helps!!
Here might be the answer you are looking for:
g = [-5.427926, -12.222018, 7.214379, -16.771845, -6.1441464, 10.1383295, 14.740516, 5.9209185, -9.740783, -10.098338]
def trunc(number, ndigits=2):
parts = str(number).split('.') # divides number into 2 parts. for ex: -5, and 4427926
truncated_number = '.'.join([parts[0], parts[1][:ndigits]]) # We keep this first part, while taking only 2 digits from the second part. Then we concat it together to get '-5.44'
return round(float(truncated_number), 2) # This should return a float number, but to make sure it is roundded to 2 decimals.
g1 = [trunc(i) for i in g]
print(g1)
[-5.42, -12.22, 7.21, -16.77, -6.14, 10.13, 14.74, 5.92, -9.74, -10.09]
Hope this helps.
Actually if David's answer is what you are looking for, it can be done simply as following:
g = [-5.427926, -12.222018, 7.214379, -16.771845, -6.1441464, 10.1383295, 14.740516, 5.9209185, -9.740783, -10.098338]
g1 = [("%.3f" % i)[:-1] for i in g]
Just take 3 decimals, and remove the last chars from the result strings. (You may convert the result to float if you like)
I've been searching around for hours and I can't find a simple way of accomplishing the following.
Value 1 = 0.00531
Value 2 = 0.051959
Value 3 = 0.0067123
I want to increment each value by its smallest decimal point (however, the number must maintain the exact number of decimal points as it started with and the number of decimals varies with each value, hence my trouble).
Value 1 should be: 0.00532
Value 2 should be: 0.051960
Value 3 should be: 0.0067124
Does anyone know of a simple way of accomplishing the above in a function that can still handle any number of decimals?
Thanks.
Have you looked at the standard module decimal?
It circumvents the floating point behaviour.
Just to illustrate what can be done.
import decimal
my_number = '0.00531'
mnd = decimal.Decimal(my_number)
print(mnd)
mnt = mnd.as_tuple()
print(mnt)
mnt_digit_new = mnt.digits[:-1] + (mnt.digits[-1]+1,)
dec_incr = decimal.DecimalTuple(mnt.sign, mnt_digit_new, mnt.exponent)
print(dec_incr)
incremented = decimal.Decimal(dec_incr)
print(incremented)
prints
0.00531
DecimalTuple(sign=0, digits=(5, 3, 1), exponent=-5)
DecimalTuple(sign=0, digits=(5, 3, 2), exponent=-5)
0.00532
or a full version (after edit also carries any digit, so it also works on '0.199')...
from decimal import Decimal, getcontext
def add_one_at_last_digit(input_string):
dec = Decimal(input_string)
getcontext().prec = len(dec.as_tuple().digits)
return dec.next_plus()
for i in ('0.00531', '0.051959', '0.0067123', '1', '0.05199'):
print(add_one_at_last_digit(i))
that prints
0.00532
0.051960
0.0067124
2
0.05200
As the other commenters have noted: You should not operate with floats because a given number 0.1234 is converted into an internal representation and you cannot further process it the way you want. This is deliberately vaguely formulated. Floating points is a subject for itself. This article explains the topic very well and is a good primer on the topic.
That said, what you could do instead is to have the input as strings (e.g. do not convert it to float when reading from input). Then you could do this:
from decimal import Decimal
def add_one(v):
after_comma = Decimal(v).as_tuple()[-1]*-1
add = Decimal(1) / Decimal(10**after_comma)
return Decimal(v) + add
if __name__ == '__main__':
print(add_one("0.00531"))
print(add_one("0.051959"))
print(add_one("0.0067123"))
print(add_one("1"))
This prints
0.00532
0.051960
0.0067124
2
Update:
If you need to operate on floats, you could try to use a fuzzy logic to come to a close presentation. decimal offers a normalize function which lets you downgrade the precision of the decimal representation so that it matches the original number:
from decimal import Decimal, Context
def add_one_float(v):
v_normalized = Decimal(v).normalize(Context(prec=16))
after_comma = v_normalized.as_tuple()[-1]*-1
add = Decimal(1) / Decimal(10**after_comma)
return Decimal(v_normalized) + add
But please note that the precision of 16 is purely experimental, you need to play with it to see if it yields the desired results. If you need correct results, you cannot take this path.
I want to print some floating point numbers so that they're always written in decimal form (e.g. 12345000000000000000000.0 or 0.000000000000012345, not in scientific notation, yet I'd want to the result to have the up to ~15.7 significant figures of a IEEE 754 double, and no more.
What I want is ideally so that the result is the shortest string in positional decimal format that still results in the same value when converted to a float.
It is well-known that the repr of a float is written in scientific notation if the exponent is greater than 15, or less than -4:
>>> n = 0.000000054321654321
>>> n
5.4321654321e-08 # scientific notation
If str is used, the resulting string again is in scientific notation:
>>> str(n)
'5.4321654321e-08'
It has been suggested that I can use format with f flag and sufficient precision to get rid of the scientific notation:
>>> format(0.00000005, '.20f')
'0.00000005000000000000'
It works for that number, though it has some extra trailing zeroes. But then the same format fails for .1, which gives decimal digits beyond the actual machine precision of float:
>>> format(0.1, '.20f')
'0.10000000000000000555'
And if my number is 4.5678e-20, using .20f would still lose relative precision:
>>> format(4.5678e-20, '.20f')
'0.00000000000000000005'
Thus these approaches do not match my requirements.
This leads to the question: what is the easiest and also well-performing way to print arbitrary floating point number in decimal format, having the same digits as in repr(n) (or str(n) on Python 3), but always using the decimal format, not the scientific notation.
That is, a function or operation that for example converts the float value 0.00000005 to string '0.00000005'; 0.1 to '0.1'; 420000000000000000.0 to '420000000000000000.0' or 420000000000000000 and formats the float value -4.5678e-5 as '-0.000045678'.
After the bounty period: It seems that there are at least 2 viable approaches, as Karin demonstrated that using string manipulation one can achieve significant speed boost compared to my initial algorithm on Python 2.
Thus,
If performance is important and Python 2 compatibility is required; or if the decimal module cannot be used for some reason, then Karin's approach using string manipulation is the way to do it.
On Python 3, my somewhat shorter code will also be faster.
Since I am primarily developing on Python 3, I will accept my own answer, and shall award Karin the bounty.
Unfortunately it seems that not even the new-style formatting with float.__format__ supports this. The default formatting of floats is the same as with repr; and with f flag there are 6 fractional digits by default:
>>> format(0.0000000005, 'f')
'0.000000'
However there is a hack to get the desired result - not the fastest one, but relatively simple:
first the float is converted to a string using str() or repr()
then a new Decimal instance is created from that string.
Decimal.__format__ supports f flag which gives the desired result, and, unlike floats it prints the actual precision instead of default precision.
Thus we can make a simple utility function float_to_str:
import decimal
# create a new context for this task
ctx = decimal.Context()
# 20 digits should be enough for everyone :D
ctx.prec = 20
def float_to_str(f):
"""
Convert the given float to a string,
without resorting to scientific notation
"""
d1 = ctx.create_decimal(repr(f))
return format(d1, 'f')
Care must be taken to not use the global decimal context, so a new context is constructed for this function. This is the fastest way; another way would be to use decimal.local_context but it would be slower, creating a new thread-local context and a context manager for each conversion.
This function now returns the string with all possible digits from mantissa, rounded to the shortest equivalent representation:
>>> float_to_str(0.1)
'0.1'
>>> float_to_str(0.00000005)
'0.00000005'
>>> float_to_str(420000000000000000.0)
'420000000000000000'
>>> float_to_str(0.000000000123123123123123123123)
'0.00000000012312312312312313'
The last result is rounded at the last digit
As #Karin noted, float_to_str(420000000000000000.0) does not strictly match the format expected; it returns 420000000000000000 without trailing .0.
If you are satisfied with the precision in scientific notation, then could we just take a simple string manipulation approach? Maybe it's not terribly clever, but it seems to work (passes all of the use cases you've presented), and I think it's fairly understandable:
def float_to_str(f):
float_string = repr(f)
if 'e' in float_string: # detect scientific notation
digits, exp = float_string.split('e')
digits = digits.replace('.', '').replace('-', '')
exp = int(exp)
zero_padding = '0' * (abs(int(exp)) - 1) # minus 1 for decimal point in the sci notation
sign = '-' if f < 0 else ''
if exp > 0:
float_string = '{}{}{}.0'.format(sign, digits, zero_padding)
else:
float_string = '{}0.{}{}'.format(sign, zero_padding, digits)
return float_string
n = 0.000000054321654321
assert(float_to_str(n) == '0.000000054321654321')
n = 0.00000005
assert(float_to_str(n) == '0.00000005')
n = 420000000000000000.0
assert(float_to_str(n) == '420000000000000000.0')
n = 4.5678e-5
assert(float_to_str(n) == '0.000045678')
n = 1.1
assert(float_to_str(n) == '1.1')
n = -4.5678e-5
assert(float_to_str(n) == '-0.000045678')
Performance:
I was worried this approach may be too slow, so I ran timeit and compared with the OP's solution of decimal contexts. It appears the string manipulation is actually quite a bit faster. Edit: It appears to only be much faster in Python 2. In Python 3, the results were similar, but with the decimal approach slightly faster.
Result:
Python 2: using ctx.create_decimal(): 2.43655490875
Python 2: using string manipulation: 0.305557966232
Python 3: using ctx.create_decimal(): 0.19519368198234588
Python 3: using string manipulation: 0.2661344590014778
Here is the timing code:
from timeit import timeit
CODE_TO_TIME = '''
float_to_str(0.000000054321654321)
float_to_str(0.00000005)
float_to_str(420000000000000000.0)
float_to_str(4.5678e-5)
float_to_str(1.1)
float_to_str(-0.000045678)
'''
SETUP_1 = '''
import decimal
# create a new context for this task
ctx = decimal.Context()
# 20 digits should be enough for everyone :D
ctx.prec = 20
def float_to_str(f):
"""
Convert the given float to a string,
without resorting to scientific notation
"""
d1 = ctx.create_decimal(repr(f))
return format(d1, 'f')
'''
SETUP_2 = '''
def float_to_str(f):
float_string = repr(f)
if 'e' in float_string: # detect scientific notation
digits, exp = float_string.split('e')
digits = digits.replace('.', '').replace('-', '')
exp = int(exp)
zero_padding = '0' * (abs(int(exp)) - 1) # minus 1 for decimal point in the sci notation
sign = '-' if f < 0 else ''
if exp > 0:
float_string = '{}{}{}.0'.format(sign, digits, zero_padding)
else:
float_string = '{}0.{}{}'.format(sign, zero_padding, digits)
return float_string
'''
print(timeit(CODE_TO_TIME, setup=SETUP_1, number=10000))
print(timeit(CODE_TO_TIME, setup=SETUP_2, number=10000))
As of NumPy 1.14.0, you can just use numpy.format_float_positional. For example, running against the inputs from your question:
>>> numpy.format_float_positional(0.000000054321654321)
'0.000000054321654321'
>>> numpy.format_float_positional(0.00000005)
'0.00000005'
>>> numpy.format_float_positional(0.1)
'0.1'
>>> numpy.format_float_positional(4.5678e-20)
'0.000000000000000000045678'
numpy.format_float_positional uses the Dragon4 algorithm to produce the shortest decimal representation in positional format that round-trips back to the original float input. There's also numpy.format_float_scientific for scientific notation, and both functions offer optional arguments to customize things like rounding and trimming of zeros.
If you are ready to lose your precision arbitrary by calling str() on the float number, then it's the way to go:
import decimal
def float_to_string(number, precision=20):
return '{0:.{prec}f}'.format(
decimal.Context(prec=100).create_decimal(str(number)),
prec=precision,
).rstrip('0').rstrip('.') or '0'
It doesn't include global variables and allows you to choose the precision yourself. Decimal precision 100 is chosen as an upper bound for str(float) length. The actual supremum is much lower. The or '0' part is for the situation with small numbers and zero precision.
Note that it still has its consequences:
>> float_to_string(0.10101010101010101010101010101)
'0.10101010101'
Otherwise, if the precision is important, format is just fine:
import decimal
def float_to_string(number, precision=20):
return '{0:.{prec}f}'.format(
number, prec=precision,
).rstrip('0').rstrip('.') or '0'
It doesn't miss the precision being lost while calling str(f).
The or
>> float_to_string(0.1, precision=10)
'0.1'
>> float_to_string(0.1)
'0.10000000000000000555'
>>float_to_string(0.1, precision=40)
'0.1000000000000000055511151231257827021182'
>>float_to_string(4.5678e-5)
'0.000045678'
>>float_to_string(4.5678e-5, precision=1)
'0'
Anyway, maximum decimal places are limited, since the float type itself has its limits and cannot express really long floats:
>> float_to_string(0.1, precision=10000)
'0.1000000000000000055511151231257827021181583404541015625'
Also, whole numbers are being formatted as-is.
>> float_to_string(100)
'100'
I think rstrip can get the job done.
a=5.4321654321e-08
'{0:.40f}'.format(a).rstrip("0") # float number and delete the zeros on the right
# '0.0000000543216543210000004442039220863003' # there's roundoff error though
Let me know if that works for you.
Interesting question, to add a little bit more of content to the question, here's a litte test comparing #Antti Haapala and #Harold solutions outputs:
import decimal
import math
ctx = decimal.Context()
def f1(number, prec=20):
ctx.prec = prec
return format(ctx.create_decimal(str(number)), 'f')
def f2(number, prec=20):
return '{0:.{prec}f}'.format(
number, prec=prec,
).rstrip('0').rstrip('.')
k = 2*8
for i in range(-2**8,2**8):
if i<0:
value = -k*math.sqrt(math.sqrt(-i))
else:
value = k*math.sqrt(math.sqrt(i))
value_s = '{0:.{prec}E}'.format(value, prec=10)
n = 10
print ' | '.join([str(value), value_s])
for f in [f1, f2]:
test = [f(value, prec=p) for p in range(n)]
print '\t{0}'.format(test)
Neither of them gives "consistent" results for all cases.
With Anti's you'll see strings like '-000' or '000'
With Harolds's you'll see strings like ''
I'd prefer consistency even if I'm sacrificing a little bit of speed. Depends which tradeoffs you want to assume for your use-case.
using format(float, ' .f '):
old = 0.00000000000000000000123
if str(old).__contains__('e-'):
float_length = str(old)[-2:]
new=format(old,'.'+str(float_length)+'f')
print(old)
print(new)
For a coding exercise I'm working on, I'm trying to compare two numbers and choose the one that has the larger number of significant digits.
For example: compare 2.37e+07 and 2.38279e+07, select 2.38279e+07 because it has more significant digits.
I don't know how to implement this in Python. I considered counting the length of each number using len(str(NUMBER)), but this method returns "10" for both of the numbers above because it doesn't differentiate between zero and non-zero digits.
How can I compare the number of significant digits in Python?
A quick and dirty approach might be len(str(NUMBER).strip('0')) which will trim off any trailing zeros and count the remaining digits.
To discount the decimal point then you'd need len(str(NUMBER).replace('.','').strip('0'))
However you need to bear in mind that in many cases converting a python float to a string can give you some odd behaviour, due to the way floating point numbers are handled.
If you're going to go with doing this textually, you can do the following using regular expression:
import re
l = re.compile(r'(\d*?)(0*)(\.0?)')
>>> l.match(str(2.37e+07)).groups()[0]
'237'
Could try an algorithm like this:
sf1 = "2.3723805"
addsf1 = 0
decimal = False
for num in sf1:
if decimal == True:
addsf1 = addsf1 + int(num)
if num == ".":
decimal = True
print(addsf1)
This would check for every letter in the significant number, if you convert it to a string, that is. It will then iterate over every letter until it reaches the decimal part and then it will add the numbers together. You can do this to the other significant figure which can be used to compare the difference between the two. This would tell which one had the larger added numbers.
Not sure if this is the best solution for your problem.