Precise definition of float string formatting? - python

Is the following behavior defined in Python's documentation (Python 2.7)?
>>> '{:20}'.format(1e10)
' 10000000000.0'
>>> '{:20g}'.format(1e10)
' 1e+10'
In fact, the first result surprises me: the documentation indicates that not indicating the format type ('f', 'e', etc.) for floats is equivalent to using the general format 'g'. This example shows that this does not seem to be the case, so I'm confused.
Maybe this is related to the fact that "A general convention is that an empty format string ("") produces the same result as if you had called str() on the value."? In fact:
>>> str(1e10)
'10000000000.0'
However, in the case of the {:20} format, the format string is not empty (it is 20), so I'm confused.
So, is this behavior of {:20} defined precisely in the documentation? Is the precise behavior of str() on floats precisely defined (str(1e11) has an exponent, but not str(1e10)…)?
PS: My goal is to format numbers with an uncertainty so that the output is very close to what floats would give (presence or not of an exponent, etc.). However, I'm having a hard time finding the exact formatting rules.
PPS: '{:20}'.format(1e10) gives a result that differs from the string formatting '{!s:20}'.format(1e10), where the string is flushed to the left (as usual for string) instead of to the right.

As #blckknght explains in comments, '{:20}' specifies a string width of 20; to specify float precision you need a decimal point before it: {:.20} or {:.20g}.
As to why the number is formatted as it is, OP said it: "A general convention is that an empty format string ("") produces the same result as if you had called str() on the value." That's what you're getting, space-padded as per the format string (it's empty as to the number format, and the format can accommodate the full str representation).

Related

Getting error in python: Value Error: invalid literal for int() with base 10: '470.21'

i want adding and subtracting this type of data: $12,587.30.which returns answer in same format.how can do this ?
Here is my code example:
print(int(col_ammount2.lstrip('$'))-int(col_ammount.lstrip('$')))
I removed $ sign and convert it to int but it gives me base 10 error.
You mentioned you want to do arithmetic operations to the numbers (addition/subtraction) so you probably want them in float instead. The difference between an integer (int) and float is that integers do not carry decimal points.
Additionally, as #officialaimm mentioned you need to remove the commas too, for example
float('$3,333.33'.replace('$', '').replace(',', ''))
will give you
3333.33
So putting it into your code
print(float(col_ammount2.lstrip('$').replace(',', ''))
- float(col_ammount.lstrip('$').replace(',', '')))
An additional note for when you parse a floating point number (same applies to integers too), you may want to watch out for empty values, i.e.
float('')
is bad. One of the things u can do in case col_amount and col_amount2 may be empty at some point is default them to 0 if that happens
float(col_amount.lstrip(...).replace(...) or 0)
You also want to read this to know about workaround to problems you may face with floating point arithmetic https://docs.python.org/3/tutorial/floatingpoint.html
There are two things you are missing here. Firstly python int(...) cannot parse numbers with commas so you will need to remove commas as well by using .replace(',',''). Secondly int() cannot parse floating point values you will have to use float(...) first and after that maybe typecast it to int using int or math.ceil, math.floor appropriately as per your choice and needs.
Maybe something like this will solve your problem:
col_ammount2='$1,587.30'
col_ammount = '$2,567.67'
print(int(float(col_ammount2.lstrip('$').replace(',','')))-int(float(col_ammount.lstrip('$').replace(',',''))))
If you are doing these sorts of things quite often in your code, making a function as such might be handy:
integerify_currency = lambda x:int(float(x.lstrip('$').replace(',','')))

Unicode int to char, leading zero

I have an integer representing a unicode character which I want to transform to the actual character so I can print it out.
However the function unichr() gives me different behaviour depending on whether there a leading zero or not. (See screenshot below for a better explanation)
However, when the integer is stored in a variable I always get the first behavior whilst I want to achieve the second. How can I do this?

Loss of precision float in python

I have a list called scores of varying -log probabilities.
when I call this function:
maxState = scores.pop(scores.index(max(scores)))
and print maxState, I realize that the maxState loses its precision as a float. Is there a way I can get the maxState without losing precision?
ex: I print out the list scores: [-35.7971525669589, -34.67875545008369]
and print maxState, I get this: -34.6787554501
(You can see it's rounded)
You are confusing string presentation with actual contents. Nowhere is precision lost, only the string produced to write to your console is using a rounded value rather than show you all digits. And always remember that float numbers are digital approximations, not precise values.
Python floats are formatted differently when using the str() and repr() functions; in a list or other container, repr() is used, but print it directly and str() is used.
If you don't like either option, format it explicitly with the format() function and specifying a precision:
print format(maxState, '.12f')
to print it with 8 decimals, for example.
Demo:
>>> maxState = -34.67875545008369
>>> repr(maxState)
'-34.67875545008369'
>>> str(maxState)
'-34.6787554501'
>>> format(maxState, '.8f')
'-34.67875545'
>>> format(maxState, '.12f')
'-34.678755450084'
The repr() output is roughly equivalent to using '.17g' as the format, while str() is equivalent to '.12g'; here the precision denotes when to use scientific notation (e) and when to display in floating point notation (f).
I say roughly because the repr() output aims to give you round-trippable output; see the change notes for Python 3.1 on float() representation, which where backported to Python 2.7:
What is new is how the number gets displayed. Formerly, Python used a simple approach. The value of repr(1.1) was computed as format(1.1, '.17g') which evaluated to '1.1000000000000001'. The advantage of using 17 digits was that it relied on IEEE-754 guarantees to assure that eval(repr(1.1)) would round-trip exactly to its original value. The disadvantage is that many people found the output to be confusing (mistaking intrinsic limitations of binary floating point representation as being a problem with Python itself).
The new algorithm for repr(1.1) is smarter and returns '1.1'. Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.

python string formatting {:d} vs %d on floating point number

I realise that this question could be construed as similar to others, so before I start, here is a list of some possible "duplicates" before everyone starts pointing them out. None of these seem to really answer my question properly.
Python string formatting: % vs. .format
"%s" % format vs "{0}".format() vs "?" format
My question specifically pertains to the use of the string.format() method for displaying integer numbers.
Running the following code using % string formatting in the interpreter running python 2.7
>>> print "%d" %(1.2345)
1
Whereas using the string.format() method results in the following
>>> print "{:d}".format(1.2345)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'd' for object type 'float'
I was expecting the same behavior in both; for the interpreter to actually convert my floating point number to an integer prior to displaying. I realise that I could just use the int function to convert the floating point number to integer format, but I was looking for the same functionality you get with the %d formatting method. Is there any string.format() method that would do this for me?
The two implementations are quite separate, and some warts in the % implementation were ironed out. Using %d for floats may mask problems in your code, where you thought you had integers but got floating point values instead. Imagine a value of 1.999999 and only seeing 1 instead of 2 as %d truncates the value.
As such, the float.__format__() hook method called by str.format() to do the actual conversion work does not support the d format and throws an exception instead.
You can use the {:.0f} format to explicitly display (rounded) floating point values with no decimal numbers:
>>> '{:.0f}'.format(1.234)
'1'
>>> '{:.0f}'.format(1.534)
'2'
or use int() before formatting to explicitly truncate your floating point number.
As a side note, if all you are doing is formatting a number as a string (and not interpolating into a larger string), use the format() function:
>>> format(1.234, '.0f')
'1'
This communicates your intent better and is a little faster to boot.
There is an important change between 2.7 and 3.0 regarding "automatic type conversion" (coercion). While 2.7 was somehow relatively "relax" regarding this, 3.0 forces you to be more disciplined.
Automatic conversion may be dangerous, as it may silently truncate/reduce some data ! Besides, this behavior is inconsistent and you never know what to expect; until you're faced with he problem. Python 3.0 requires that you specify what you want to, precisely, do !
However, the new string.format() adds some very powerful and useful formatting techniques. It's even very clear with the "free" format '{}'. Like this :
'{}'.format(234)
'{:10}.format(234)
'{:<10}'.format(234)
See ? I didn't need to specify 'integer', 'float' or anything else. This will work for any type of values.
for v in (234, 1.234, 'toto'):
for fmt in ('[{}]', '[{:10}]', '[{:<10d}]', '[{:>10d}]'):
print(fmt.format(v))
Besides, the % value is obsolete and should not be used any more. The new string.format() is easier to use and has more features than the old formatting techniques. Which, IMHO, renders the old technique less attractive.

Converting "0x08h, 0x8ah" to [int,int] in Python

I've got string like x='0x08h, 0x0ah' in Python, wanting to convert it to [8,10] (like unsigned ints). I could split and index it like [int(a[-3:-1],16) for a in x.split(', ')] but is there a better way to convert it to a list of ints?
Would it matter if I had y='080a'?
edit (for plus points:).) what (sane) string-based hexadecimal notations have python support, and which not?
You really have to know what the pattern you're trying to parse is, before you write a parser.
But it looks like your pattern is: optional 0x, then hex digits, then optional h. At least that's the most reasonable thing I can come up with that handles both '0x08h' and '080a'. So:
def parse_hex(s):
return int(s.lstrip('0x').rstrip('h'), 16)
Then:
numbers = [parse_hex(s) for s in x.split(', ')]
Of course you don't actually need to remove the 0x prefix, because Python accepts that as part of a hex string, so you could write it as:
def parse_hex(s):
return int(s.rstrip('h'), 16)
However, I think the intention is clearer if you're more explicit.
From your edit:
edit what (sane) string-based hexadecimal notations have python support, and which not?
See the documentation for int:
Base-2, -8, and -16 literals can be optionally prefixed with 0b/0B, 0o/0O, or 0x/0X, as with integer literals in code.
That's it. (If you read the rest of the paragraph, if you're guaranteed to have 0x/0X, you don't have to explicitly use base=16. But that doesn't help you here, so that one sentence is really all you need.) The docs on Numeric Types and Numeric literals detail exactly what "as with integer literals in code"; the only thing surprising there is that negative numbers aren't literals, complex numbers aren't literals (but pure imaginary numbers are), and non-ASCII digits can be used but the documentation doesn't explain how.
You can also use map: map(lambda s:int(s.lower().replace('0x','').replace('h',''), 16),x.split(', '))

Categories